Making a English Wikipedia server: Difference between revisions
m Stuckthrough obsolete mwdumper - php import method is now the best way and will be covered at some point |
Saved progress so far |
||
Line 1: | Line 1: | ||
[https://www.mediawiki.org MediaWiki] makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. | [https://www.mediawiki.org MediaWiki] makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way. | ||
This guide will help you long the way. | |||
This guide is based entierly on the English Wikipedia, but should be quite relevent for other languages too. | |||
== Prerequisites == | == Prerequisites == | ||
Line 8: | Line 9: | ||
=== What you need to know === | === What you need to know === | ||
You will need the following to get started | You will need the following to get started: | ||
* Apache Web Server | * Apache Web Server | ||
Line 14: | Line 15: | ||
* MySQL/MariaDB | * MySQL/MariaDB | ||
* A dump of your Wikipedia of choice | * A dump of your Wikipedia of choice | ||
=== Before you start === | === Before you start === | ||
* Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around | * Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around 100GB as a minimum for the download, around 500GB to do this comfortably once the database grows, remember that Wikipedia is always growing...) | ||
* Prepare MySQL/MariaDB for the large transactions | * Prepare MySQL/MariaDB and PHP for the incoming large transactions. | ||
== | == Preperation == | ||
=== Downloading the dumps === | === Downloading the dumps === | ||
# The dumps for English Wikipedia are available from [https://dumps.wikimedia.org/enwiki/ | # The dumps for English Wikipedia are available from [https://dumps.wikimedia.org/enwiki/ Wikimedia Dumps]. When there, you'll obviously want to select the latest date. | ||
# Once there, you'll need to download the following: | # Once there, you'll need to download the following: | ||
#* <code>enwiki- | #* <code>enwiki-(date)-pages-articles-multistream.xml.bz2</code> (This is the latest revision of every Wikipedia page, article and template - the basics you need to get going) | ||
# | # This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing. | ||
=== Downloading and installing MediaWiki === | |||
* Download the latest version of Mediawiki from the [https://www.mediawiki.org Mediawiki] website. As were using Linux, it's better to download the .tar.gz version. | |||
* Extract the archive | |||
* Clear out anything not needed: | |||
** Timeless and Mono skin | |||
** Text files in the root, install.sh, docker... | |||
* Copy the folder to the webroot | |||
* Place a file in the .../resources/asset folder if using a picture for the site logo/favicon | |||
=== Preparing the database === | |||
=== | * Login to MariaDB as root | ||
* Create the database: | |||
CREATE DATABASE enwiki; | |||
* Create a user for the database: | |||
CREATE USER 'enwiki'@'localhost' IDENTIFIED BY 'database_password'; | |||
A password can be generated at [https://passwordsgenerator.net/ Password Generator Plus]. Use a length as long as possible, it doesn't need to be remembered past this configuration. | |||
* Grant priveliges for the user to this database: | |||
GRANT ALL PRIVILEGES ON enwiki.* TO 'enwiki'@'localhost' WITH GRANT OPTION; | |||
* Exit MariaDB | |||
* Restart the server | |||
systemctl restart mariadb | |||
This can be tweaked with different database and user names as required. | |||
=== Moving databse to a different hard drive === | |||
Due to the sheer size of the databse, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a seperate folder by default making this easy. | |||
* Stop MariaDB | |||
systemctl stop mariadb | |||
* Navigate to <code>/var/lib/mysql</code> | |||
* Move the <code>enwiki</code>/database name folder to where you want the database to be stored | |||
* Back in the <code>/var/lib/mysql</code> folder, create a symlink to where you moved the folder, using the same name for the symlink | |||
* Chown the databse folder where you moved it to mysql: | |||
chown -R mysql:root /path/to/folder/enwiki | |||
* Restart MariaDB, check it starts with no errors | |||
systemctl start mariadb | |||
=== Install Mediawiki === | |||
* Navigate to where your instance is installed: for example, https://enwiki.freddythechick.net/. You will be greeted by the Mediawiki installer. | |||
== Importing the dump == |
Revision as of 01:55, 12 August 2024
MediaWiki makes dumps of the English Wikipedia about once a month. As it is free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As the English Wikipedia is the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.
This guide is based entierly on the English Wikipedia, but should be quite relevent for other languages too.
Prerequisites
Here is some information of a few things you need to know before you get started. It will all be covered in the instructions below.
What you need to know
You will need the following to get started:
- Apache Web Server
- PHP
- MySQL/MariaDB
- A dump of your Wikipedia of choice
Before you start
- Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around 100GB as a minimum for the download, around 500GB to do this comfortably once the database grows, remember that Wikipedia is always growing...)
- Prepare MySQL/MariaDB and PHP for the incoming large transactions.
Preperation
Downloading the dumps
- The dumps for English Wikipedia are available from Wikimedia Dumps. When there, you'll obviously want to select the latest date.
- Once there, you'll need to download the following:
enwiki-(date)-pages-articles-multistream.xml.bz2
(This is the latest revision of every Wikipedia page, article and template - the basics you need to get going)
- This is compressed with Bzip2 - if you have the space, extract it once downloaded to speed up importing.
Downloading and installing MediaWiki
- Download the latest version of Mediawiki from the Mediawiki website. As were using Linux, it's better to download the .tar.gz version.
- Extract the archive
- Clear out anything not needed:
- Timeless and Mono skin
- Text files in the root, install.sh, docker...
- Copy the folder to the webroot
- Place a file in the .../resources/asset folder if using a picture for the site logo/favicon
Preparing the database
- Login to MariaDB as root
- Create the database:
CREATE DATABASE enwiki;
- Create a user for the database:
CREATE USER 'enwiki'@'localhost' IDENTIFIED BY 'database_password';
A password can be generated at Password Generator Plus. Use a length as long as possible, it doesn't need to be remembered past this configuration.
- Grant priveliges for the user to this database:
GRANT ALL PRIVILEGES ON enwiki.* TO 'enwiki'@'localhost' WITH GRANT OPTION;
- Exit MariaDB
- Restart the server
systemctl restart mariadb
This can be tweaked with different database and user names as required.
Moving databse to a different hard drive
Due to the sheer size of the databse, you may choose to move the MariaDB database to a different drive. MariaDB stores each database in a seperate folder by default making this easy.
- Stop MariaDB
systemctl stop mariadb
- Navigate to
/var/lib/mysql
- Move the
enwiki
/database name folder to where you want the database to be stored - Back in the
/var/lib/mysql
folder, create a symlink to where you moved the folder, using the same name for the symlink - Chown the databse folder where you moved it to mysql:
chown -R mysql:root /path/to/folder/enwiki
- Restart MariaDB, check it starts with no errors
systemctl start mariadb
Install Mediawiki
- Navigate to where your instance is installed: for example, https://enwiki.freddythechick.net/. You will be greeted by the Mediawiki installer.