Making a English Wikipedia server: Difference between revisions

From ThinkServer
Saved progress so far
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
[https://www.mediawiki.org MediaWiki] makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.
[https://www.mediawiki.org MediaWiki] makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.


This guide is based entierly on the English Wikipedia, but should be quite relevent for other languages too.
This guide is based entirely on the English Wikipedia, but should be quite relevant for other languages too.


== Prerequisites ==
== Prerequisites ==
Line 27: Line 27:
* Prepare MySQL/MariaDB and PHP for the incoming large transactions.
* Prepare MySQL/MariaDB and PHP for the incoming large transactions.


== Preperation ==
== Preparation ==


=== Downloading the dumps ===
=== Downloading the dumps ===
Line 33: Line 33:
# The dumps for English Wikipedia are available from [https://dumps.wikimedia.org/enwiki/ Wikimedia Dumps]. When there, you'll obviously want to select the latest date.
# The dumps for English Wikipedia are available from [https://dumps.wikimedia.org/enwiki/ Wikimedia Dumps]. When there, you'll obviously want to select the latest date.
# Once there, you'll need to download the following:
# Once there, you'll need to download the following:
#* <code>enwiki-(date)-pages-articles-multistream.xml.bz2</code> (This is the latest revision of every Wikipedia page, article and template - the basics you need to get going)
#* <code>enwiki-<date>-pages-articles-multistream.xml.bz2</code> (This is the latest revision of every Wikipedia page, article and template - the basics you need to get going)
# This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing.
# This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing.


Line 45: Line 45:
* Copy the folder to the webroot
* Copy the folder to the webroot
* Place a file in the .../resources/asset folder if using a picture for the site logo/favicon
* Place a file in the .../resources/asset folder if using a picture for the site logo/favicon
* The following extra extensions need to be downloaded, extracted and dropped in the extensions folder:
** [https://www.mediawiki.org/wiki/Extension:MassMessage MassMessage]
** [https://www.mediawiki.org/wiki/Extension:TemplateStyles TemplateStyles]
** [https://www.mediawiki.org/wiki/Extension:JsonConfig JsonConfig]
** [https://www.mediawiki.org/wiki/Extension:MobileFrontend MobileFrontend]


=== Preparing the database ===
=== Preparing the database ===
Line 62: Line 67:
This can be tweaked with different database and user names as required.
This can be tweaked with different database and user names as required.


=== Moving databse to a different hard drive ===
=== Moving database to a different hard drive ===


Due to the sheer size of the databse, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a seperate folder by default making this easy.
Due to the sheer size of the database, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a separate folder by default making this easy.


* Stop MariaDB
* Stop MariaDB
Line 109: Line 114:
*** JsonConfig
*** JsonConfig
*** MobileFrontend
*** MobileFrontend
*** ParserFunctions
** Check 'Enable Instant Commons'
** Check 'Enable Instant Commons'
** Change under Logo (icon) and Sidebar logo (optional) <code>change-your-logo.svg</code> to the filename of the image you dropped in the resource folder earlier. You will see a preview of it underneath
** Change under Logo (icon) and Sidebar logo (optional) <code>change-your-logo.svg</code> to the filename of the image you dropped in the resource folder earlier. You will see a preview of it underneath
Line 114: Line 120:
* Click 'Continue'
* Click 'Continue'
* Click 'Continue' to finish the install
* Click 'Continue' to finish the install
* Ensure the installation has completed sucessfully then click 'Continue'
* Ensure the installation has completed successfully then click 'Continue'
* A copy of the LocalSettings.php file will be downloaded automatically. This needs to be moved to the root folder of the wiki which will then make it work.
* A copy of the LocalSettings.php file will be downloaded automatically. This needs to be moved to the root folder of the wiki which will then make it work.
* Click 'Enter your wiki'. You will be greeted with the default MediaWiki Main page if all is well.
* Click 'Enter your wiki'. You will be greeted with the default MediaWiki Main page if all is well.
=== Finish configuring LocalSettings.php ===
* Open LocalSettings.php in a text editor
* Add a line to enable a favicon:
$wgFavicon = "$wgResourceBasePath/resources/assets/<logoimage.svg>";
* Edit <code>$wgEmergencyContact</code> and <code>$wgPasswordSender</code> with your email address between the quote marks
* Change <code>$wgLocaltimezone</code> from UTC to your correct PHP time zone - in our case, <code>Europe/London</code>
* Add under <code>$wgGroupPermissions['*']['edit'] = false;</code>:
$wgGroupPermissions['*']['createaccount'] = false;
This stops user accounts being created and prevents editing by anyone other than yourself.
* Add under <code>wfLoadSkin ( 'Vector' );</code> the following:
$wgDefaultSkin = 'vector-2022';
This enables the modern Vector 2022 skin
* Add under <code>wfLoadExtension( 'JsonConfig' );</code>:
$wgJsonConfigEnableLuaSupport = true; // required to use JsonConfig in Lua
$wgJsonConfigModels['Tabular.JsonConfig'] = 'JsonConfig\JCTabularContent';
$wgJsonConfigs['Tabular.JsonConfig'] = [
        'namespace' => 486,
        'nsName' => 'Data',
        // page name must end in ".tab", and contain at least one symbol
        'pattern' => '/.\.tab$/',
        'license' => 'CC0-1.0',
        'isLocal' => false,
];
$wgJsonConfigModels['Map.JsonConfig'] = 'JsonConfig\JCMapDataContent';
$wgJsonConfigs['Map.JsonConfig'] = [
        'namespace' => 486,
        'nsName' => 'Data',
        // page name must end in ".map", and contain at least one symbol
        'pattern' => '/.\.map$/',
        'license' => 'CC0-1.0',
        'isLocal' => false,
];
$wgJsonConfigInterwikiPrefix = "commons";
$wgJsonConfigs['Tabular.JsonConfig']['remote'] = [
        'url' => 'https://commons.wikimedia.org/w/api.php'
];
$wgJsonConfigs['Map.JsonConfig']['remote'] = [
        'url' => 'https://commons.wikimedia.org/w/api.php'
];
* Add under <code>wfLoadExtension( 'MobileFrontend' );</code>:
$wgMFDefaultSkinClass = 'SkinMinerva';
* Add under <code>wfLoadExtension( 'ParserFunctions' );</code>:
$wgPFEnableStringFunctions = true;
* Add under <code>wfLoadExtension( 'Scribunto' );</code>:
$wgScribuntoDefaultEngine = 'luastandalone';
* Add the following block at the end (leave commented out until needed for debugging for security and comment back out once debugging finished):
# Show PHP exceptions (only enable on error)
#$wgShowExceptionDetails = true;
* Add the following block at the end - this allows larger modules to be uploaded without errors:
$wgMaxArticleSize = 4096;      # Size in kb
$wgAPIMaxResultSize = 4096000;  # Size in b
Go back to the website and check that it is still functional. If you are greeted by a blank page, the most common reason is a missing <code>;</code> at the end of a line - all lines must end with a <code>;</code>.
== Importing the dump ==
== Importing the dump ==

Latest revision as of 03:47, 10 October 2024

MediaWiki makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.

This guide is based entirely on the English Wikipedia, but should be quite relevant for other languages too.

Prerequisites

Here is some information of a few things you need to know before you get started. It will all be covered in the instructions below.

What you need to know

You will need the following to get started:

  • Apache Web Server
  • PHP
  • MySQL/MariaDB
  • A dump of your Wikipedia of choice

You will also need the following for optimal performance:

  • ICU
  • ImageMagick
  • git (for revision control - not strictly necessary)

Before you start

  • Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around 100GB as a minimum for the download, around 500GB to do this comfortably once the database grows, remember that Wikipedia is always growing...)
  • Prepare MySQL/MariaDB and PHP for the incoming large transactions.

Preparation

Downloading the dumps

  1. The dumps for English Wikipedia are available from Wikimedia Dumps. When there, you'll obviously want to select the latest date.
  2. Once there, you'll need to download the following:
    • enwiki-<date>-pages-articles-multistream.xml.bz2 (This is the latest revision of every Wikipedia page, article and template - the basics you need to get going)
  3. This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing.

Downloading and installing MediaWiki

  • Download the latest version of Mediawiki from the Mediawiki website. As were using Linux, it's better to download the .tar.gz version.
  • Extract the archive
  • Clear out anything not needed:
    • Timeless and Mono skin
    • Text files in the root, install.sh, docker...
  • Copy the folder to the webroot
  • Place a file in the .../resources/asset folder if using a picture for the site logo/favicon
  • The following extra extensions need to be downloaded, extracted and dropped in the extensions folder:

Preparing the database

  • Login to MariaDB as root
  • Create the database:
CREATE DATABASE enwiki;
  • Create a user for the database:
CREATE USER 'enwiki'@'localhost' IDENTIFIED BY 'database_password';

A password can be generated at Password Generator Plus. Use a length as long as possible, it doesn't need to be remembered past this configuration.

  • Grant priveliges for the user to this database:
GRANT ALL PRIVILEGES ON enwiki.* TO 'enwiki'@'localhost' WITH GRANT OPTION;
  • Exit MariaDB
  • Restart the server
systemctl restart mariadb

This can be tweaked with different database and user names as required.

Moving database to a different hard drive

Due to the sheer size of the database, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a separate folder by default making this easy.

  • Stop MariaDB
systemctl stop mariadb
  • Navigate to /var/lib/mysql
  • Move the enwiki/database name folder to where you want the database to be stored
  • Back in the /var/lib/mysql folder, create a symlink to where you moved the folder, using the same name for the symlink
  • Chown the databse folder where you moved it to mysql:
chown -R mysql:root /path/to/folder/enwiki
  • Restart MariaDB, check it starts with no errors
systemctl start mariadb

Install Mediawiki

  • Navigate to where your instance is installed: for example, https://enwiki.freddythechick.net/. You will be greeted by the Mediawiki installer. Click the 'set up the wiki' link
  • Choose your language then click 'Continue'
  • Check the Environmental checks and fix anything flagged up here. If all is OK, you will see 'The environment has been checked. You can install MediaWiki.' in green with a tick. At this point, click 'Continue'
  • Enter the database configuration:
    • Database type: MariaDB, MySQL, or compatible
    • Database host: localhost
    • Database name (no hyphens): enwiki
    • Database table prefix (no hyphens): <leave blank>
    • Database username: enwiki
    • Database password: <use password generated/configured earlier>
  • Click 'Continue'
  • On Database settings, leave the 'Use the same account as for installation' box checked and click 'Continue'
  • On Name, enter the following settings:
    • Name of wiki:<whatever you want your wiki to be called>
    • Project namespace: Same as the wiki name
  • Enter details of an account username, password and e-mail address so that you can administer the wiki later.
  • Make sure 'Ask me more questions' is checked and click 'Continue'
  • On Options, select the following:
    • User rights profile: Account creation required (we will be further locking this down later)
    • Copyright and license: Creative Commons Attribution-ShareAlike (this specific license MUST be selected if using Wikipedia's content)
    • Uncheck 'Enable outbound email'
    • Select Vector as the default skin with the radio button
    • Check the following extensions to enable them:
      • MassMessage
      • Cite
      • Math
      • Poem
      • Scribunto
      • TemplateStyles
      • JsonConfig
      • MobileFrontend
      • ParserFunctions
    • Check 'Enable Instant Commons'
    • Change under Logo (icon) and Sidebar logo (optional) change-your-logo.svg to the filename of the image you dropped in the resource folder earlier. You will see a preview of it underneath
    • Ensure PHP object caching (APC, APCu or WinCache) is selected
  • Click 'Continue'
  • Click 'Continue' to finish the install
  • Ensure the installation has completed successfully then click 'Continue'
  • A copy of the LocalSettings.php file will be downloaded automatically. This needs to be moved to the root folder of the wiki which will then make it work.
  • Click 'Enter your wiki'. You will be greeted with the default MediaWiki Main page if all is well.

Finish configuring LocalSettings.php

  • Open LocalSettings.php in a text editor
  • Add a line to enable a favicon:
$wgFavicon = "$wgResourceBasePath/resources/assets/<logoimage.svg>";
  • Edit $wgEmergencyContact and $wgPasswordSender with your email address between the quote marks
  • Change $wgLocaltimezone from UTC to your correct PHP time zone - in our case, Europe/London
  • Add under $wgGroupPermissions['*']['edit'] = false;:
$wgGroupPermissions['*']['createaccount'] = false;

This stops user accounts being created and prevents editing by anyone other than yourself.

  • Add under wfLoadSkin ( 'Vector' ); the following:
$wgDefaultSkin = 'vector-2022';

This enables the modern Vector 2022 skin

  • Add under wfLoadExtension( 'JsonConfig' );:
$wgJsonConfigEnableLuaSupport = true; // required to use JsonConfig in Lua
$wgJsonConfigModels['Tabular.JsonConfig'] = 'JsonConfig\JCTabularContent';
$wgJsonConfigs['Tabular.JsonConfig'] = [
        'namespace' => 486,
        'nsName' => 'Data',
        // page name must end in ".tab", and contain at least one symbol
        'pattern' => '/.\.tab$/',
        'license' => 'CC0-1.0',
        'isLocal' => false,
];
$wgJsonConfigModels['Map.JsonConfig'] = 'JsonConfig\JCMapDataContent';
$wgJsonConfigs['Map.JsonConfig'] = [
        'namespace' => 486,
        'nsName' => 'Data',
        // page name must end in ".map", and contain at least one symbol
        'pattern' => '/.\.map$/',
        'license' => 'CC0-1.0',
        'isLocal' => false,
];
$wgJsonConfigInterwikiPrefix = "commons";
$wgJsonConfigs['Tabular.JsonConfig']['remote'] = [
        'url' => 'https://commons.wikimedia.org/w/api.php'
];
$wgJsonConfigs['Map.JsonConfig']['remote'] = [
        'url' => 'https://commons.wikimedia.org/w/api.php'
];
  • Add under wfLoadExtension( 'MobileFrontend' );:
$wgMFDefaultSkinClass = 'SkinMinerva';
  • Add under wfLoadExtension( 'ParserFunctions' );:
$wgPFEnableStringFunctions = true;
  • Add under wfLoadExtension( 'Scribunto' );:
$wgScribuntoDefaultEngine = 'luastandalone';
  • Add the following block at the end (leave commented out until needed for debugging for security and comment back out once debugging finished):
# Show PHP exceptions (only enable on error)
#$wgShowExceptionDetails = true;
  • Add the following block at the end - this allows larger modules to be uploaded without errors:
$wgMaxArticleSize = 4096;       # Size in kb
$wgAPIMaxResultSize = 4096000;  # Size in b

Go back to the website and check that it is still functional. If you are greeted by a blank page, the most common reason is a missing ; at the end of a line - all lines must end with a ;.

Importing the dump