Making a English Wikipedia server

Revision as of 02:28, 12 August 2024 by Sam (talk | contribs) (Saved progress so far)

MediaWiki makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.

This guide is based entierly on the English Wikipedia, but should be quite relevent for other languages too.

Prerequisites

Here is some information of a few things you need to know before you get started. It will all be covered in the instructions below.

What you need to know

You will need the following to get started:

  • Apache Web Server
  • PHP
  • MySQL/MariaDB
  • A dump of your Wikipedia of choice

You will also need the following for optimal performance:

  • ICU
  • ImageMagick
  • git (for revision control - not strictly necessary)

Before you start

  • Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around 100GB as a minimum for the download, around 500GB to do this comfortably once the database grows, remember that Wikipedia is always growing...)
  • Prepare MySQL/MariaDB and PHP for the incoming large transactions.

Preperation

Downloading the dumps

  1. The dumps for English Wikipedia are available from Wikimedia Dumps. When there, you'll obviously want to select the latest date.
  2. Once there, you'll need to download the following:
    • enwiki-(date)-pages-articles-multistream.xml.bz2 (This is the latest revision of every Wikipedia page, article and template - the basics you need to get going)
  3. This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing.

Downloading and installing MediaWiki

  • Download the latest version of Mediawiki from the Mediawiki website. As were using Linux, it's better to download the .tar.gz version.
  • Extract the archive
  • Clear out anything not needed:
    • Timeless and Mono skin
    • Text files in the root, install.sh, docker...
  • Copy the folder to the webroot
  • Place a file in the .../resources/asset folder if using a picture for the site logo/favicon

Preparing the database

  • Login to MariaDB as root
  • Create the database:
CREATE DATABASE enwiki;
  • Create a user for the database:
CREATE USER 'enwiki'@'localhost' IDENTIFIED BY 'database_password';

A password can be generated at Password Generator Plus. Use a length as long as possible, it doesn't need to be remembered past this configuration.

  • Grant priveliges for the user to this database:
GRANT ALL PRIVILEGES ON enwiki.* TO 'enwiki'@'localhost' WITH GRANT OPTION;
  • Exit MariaDB
  • Restart the server
systemctl restart mariadb

This can be tweaked with different database and user names as required.

Moving databse to a different hard drive

Due to the sheer size of the databse, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a seperate folder by default making this easy.

  • Stop MariaDB
systemctl stop mariadb
  • Navigate to /var/lib/mysql
  • Move the enwiki/database name folder to where you want the database to be stored
  • Back in the /var/lib/mysql folder, create a symlink to where you moved the folder, using the same name for the symlink
  • Chown the databse folder where you moved it to mysql:
chown -R mysql:root /path/to/folder/enwiki
  • Restart MariaDB, check it starts with no errors
systemctl start mariadb

Install Mediawiki

  • Navigate to where your instance is installed: for example, https://enwiki.freddythechick.net/. You will be greeted by the Mediawiki installer. Click the 'set up the wiki' link
  • Choose your language then click 'Continue'
  • Check the Environmental checks and fix anything flagged up here. If all is OK, you will see 'The environment has been checked. You can install MediaWiki.' in green with a tick. At this point, click 'Continue'
  • Enter the database configuration:
    • Database type: MariaDB, MySQL, or compatible
    • Database host: localhost
    • Database name (no hyphens): enwiki
    • Database table prefix (no hyphens): <leave blank>
    • Database username: enwiki
    • Database password: <use password generated/configured earlier>
  • Click 'Continue'
  • On Database settings, leave the 'Use the same account as for installation' box checked and click 'Continue'
  • On Name, enter the following settings:
    • Name of wiki:<whatever you want your wiki to be called>
    • Project namespace: Same as the wiki name
  • Enter details of an account username, password and e-mail address so that you can administer the wiki later.
  • Make sure 'Ask me more questions' is checked and click 'Continue'
  • On Options, select the following:
    • User rights profile: Account creation required (we will be further locking this down later)
    • Copyright and license: Creative Commons Attribution-ShareAlike (this specific license MUST be selected if using Wikipedia's content)
    • Uncheck 'Enable outbound email'
    • Select Vector as the default skin with the radio button
    • Check the following extensions to enable them:
      • MassMessage
      • Cite
      • Math
      • Poem
      • Scribunto
      • TemplateStyles
      • JsonConfig
      • MobileFrontend
    • Check 'Enable Instant Commons'
    • Change under Logo (icon) and Sidebar logo (optional) change-your-logo.svg to the filename of the image you dropped in the resource folder earlier. You will see a preview of it underneath
    • Ensure PHP object caching (APC, APCu or WinCache) is selected
  • Click 'Continue'
  • Click 'Continue' to finish the install
  • Ensure the installation has completed sucessfully then click 'Continue'
  • A copy of the LocalSettings.php file will be downloaded automatically. This needs to be moved to the root folder of the wiki which will then make it work.
  • Click 'Enter your wiki'. You will be greeted with the default MediaWiki Main page if all is well.

Importing the dump