Making a English Wikipedia server: Difference between revisions
m Changed links to https://, fixed Java link |
m Fixed Java link syntax |
||
Line 15: | Line 15: | ||
* A dump of your Wikipedia of choice | * A dump of your Wikipedia of choice | ||
* MWDumper | * MWDumper | ||
* The latest Java JRE and Java JDK from the | * The latest Java JRE and Java JDK from the [https://www.java.com Oracle Java] website | ||
=== Before you start === | === Before you start === |
Revision as of 02:06, 26 June 2019
MediaWiki makes dumps of the English Wikipedia about once a month. As it is all free and open source content, you can use these dumps to make your own server with the English Wikipedia content in it. As it is, of course, the largest Wikipedia, it does take a while to import the dumps, but it is by no means impossible. This guide will help you long the way.
Prerequisites
Here is some information of a few things you need to know before you get started. It will all be covered in the instructions below.
What you need to know
You will need the following to get started with a dump:
- Apache Web Server
- PHP
- MySQL/MariaDB
- A dump of your Wikipedia of choice
- MWDumper
- The latest Java JRE and Java JDK from the Oracle Java website
Before you start
- Remember that some of the Wikipedia dumps are huge. You will need a lot of disk space (Around 200GB as a minimum, around 500GB to do this comfortably, remember that Wikipedia is always growing...)
- Prepare MySQL/MariaDB for the large transactions coming up. A suggestion is to look in
/usr/share/mysql/my-huge.cnf
and consider using the values under the[mysql]
header, at least while you are importing the database. The most important value that MUST be changed or the import will fail, is the valuemax_allowed_packet = 1M
. This will need to be changed tomax_allowed_packet = 128M
. Due to the size of some of Wikipedia's articles, if this value isn't changed, MySQL/MariaDB will reject the record if it is more than 1M and the size isn't changed. 128M is more than enough during import and can be safely changed back after. - The tables must be cleared as per the instructions below before attempting importing or it will fail.
Importing the dump
Downloading the dumps
- The dumps for English Wikipedia are available from [[1]]. When there, you'll obviously want to select the latest date.
- Once there, you'll need to download the following:
enwiki-<date>-pages-artilcles.xml.bz2
(This is the latest revision of every Wikipedia page, article and template - the basis you need to get going)enwiki-<date>-redirect.sql.bz2
(This will make redirects function correctly)enwiki-<date>-templatelinks.sql.bz2
(This will make the template links function correctly)enwiki-<date>-site_stats.sql.bz2
(This will fill in article counts and the like without searching for you)
- Put all the files in a dedicated folder so they are all available in one place for later.
Downloading MWDumper
- MWDumper is available from many places around the Internet, both in source form and already built Java packages. You will need to download a copy from [Jenkins], this is pre-built by MediaWiki. MWDumper 1.16 (26/06/2013) was the latest at the time of writing.
- You will need to remove any versions of OpenJDK already installed (remove
libreoffice-calc-extensions
andlibreoffice-writer-extensions
before OpenJDK so that it doesn't try to install another version of Java). - You will then need to install the latest Oracle Java JRE and JDK packages (64-bit packages are safe and better for this as we don't need the web plugin)