Making a English Wikipedia server

From ThinkServer
Revision as of 01:55, 12 August 2024 by Sam (talk | contribs) (Saved progress so far)

MediaWiki makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.

This guide is based entierly on the English Wikipedia, but should be quite relevent for other languages too.

Prerequisites

Here is some information of a few things you need to know before you get started. It will all be covered in the instructions below.

What you need to know

You will need the following to get started:

  • Apache Web Server
  • PHP
  • MySQL/MariaDB
  • A dump of your Wikipedia of choice

Before you start

  • Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around 100GB as a minimum for the download, around 500GB to do this comfortably once the database grows, remember that Wikipedia is always growing...)
  • Prepare MySQL/MariaDB and PHP for the incoming large transactions.

Preperation

Downloading the dumps

  1. The dumps for English Wikipedia are available from Wikimedia Dumps. When there, you'll obviously want to select the latest date.
  2. Once there, you'll need to download the following:
    • enwiki-(date)-pages-articles-multistream.xml.bz2 (This is the latest revision of every Wikipedia page, article and template - the basics you need to get going)
  3. This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing.

Downloading and installing MediaWiki

  • Download the latest version of Mediawiki from the Mediawiki website. As were using Linux, it's better to download the .tar.gz version.
  • Extract the archive
  • Clear out anything not needed:
    • Timeless and Mono skin
    • Text files in the root, install.sh, docker...
  • Copy the folder to the webroot
  • Place a file in the .../resources/asset folder if using a picture for the site logo/favicon

Preparing the database

  • Login to MariaDB as root
  • Create the database:
CREATE DATABASE enwiki;
  • Create a user for the database:
CREATE USER 'enwiki'@'localhost' IDENTIFIED BY 'database_password';

A password can be generated at Password Generator Plus. Use a length as long as possible, it doesn't need to be remembered past this configuration.

  • Grant priveliges for the user to this database:
GRANT ALL PRIVILEGES ON enwiki.* TO 'enwiki'@'localhost' WITH GRANT OPTION;
  • Exit MariaDB
  • Restart the server
systemctl restart mariadb

This can be tweaked with different database and user names as required.

Moving databse to a different hard drive

Due to the sheer size of the databse, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a seperate folder by default making this easy.

  • Stop MariaDB
systemctl stop mariadb
  • Navigate to /var/lib/mysql
  • Move the enwiki/database name folder to where you want the database to be stored
  • Back in the /var/lib/mysql folder, create a symlink to where you moved the folder, using the same name for the symlink
  • Chown the databse folder where you moved it to mysql:
chown -R mysql:root /path/to/folder/enwiki
  • Restart MariaDB, check it starts with no errors
systemctl start mariadb

Install Mediawiki

Importing the dump