MediaWiki: Difference between revisions

From Bibliotheca Anonoma
No edit summary
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
The Bibliotheca Anonoma Wiki is configured quite uniquely for our needs. It uses an Nginx web server, PostgreSQL DB on an SSD, and images on a RAID for hosting, and has a load of interesting extensions.
The Bibliotheca Anonoma Wiki is configured quite uniquely for our needs. It uses an Nginx web server, PostgreSQL DB on an SSD, and images on a RAID for hosting, and has a load of interesting extensions.


* Load Balancer DNS - Cloudflare
* '''Load Balancer DNS''' - Cloudflare
* Caching Front Server - Varnish
* '''Caching Front Server''' - Varnish
* Web Server - Nginx
* '''Web Server''' - Nginx
* PHP Engine - PHP FPM
* '''PHP Engine''' - HHVM
* Database - PostgreSQL
* '''Database''' - MariaDB
* Cache - memcached
* '''Lua Engine''' - LuaSandbox C module
* '''Cache''' - memcached


== To Do ==
== Network Topology of the Bibliotheca Anonoma Wiki ==


* Enable Anonymous Hash IDs like Tanasinn.info
Ordered in layers from front to back. Details on our implementation and instructions on how to replicate them are found in pages below.
* Activate Varnish Caching
* <s>Enable Cloudflare</s>
* <s>Upgrade to Mediawiki 1.28</s>
* <s>Enable PostgreSQL UNIX Socket on PHP-FPM for higher performance</s>
* <s>Fix Logo for multiple sizes (displays weird on some computers, maybe make it match the Mediawiki default dimensions?)</s>
* <s>Activate img.bibanon.org for finer tuned image management and serving</s>
** <s>Prevent hotlinking on images from external sites to preserve bandwidth</s>
* <s>Activate the best job queue system you can</s>
* <s>Activate Memcached</s>


== Pages ==
* [[Cloudflare/MediaWiki|Cloudflare]] - Cloudflare is a CDN, load balancer, and DDoS mitigation service: all for free.
 
** Using cloudflare may require some special mods to Mediawiki or just the web server to get the actual client IPs transferred, which is crucial if you allow anonymous IP edit (so you ban the user, not the entire cloudflare server).
* [[Varnish]] - This caching front server is used on Wikimedia sites to significantly reduce the amount of regeneration that dynamic pages need, while preventing outdated caches by having Mediawiki directly tell Varnish what needs to be regenerated.
** [https://www.mediawiki.org/wiki/Manual:CloudFlare#Installing_mod_cloudflare_in_Apache Apache mod_cloudflare] - An extension used for transmitting the real ip using X-Forwarded-For.
** [https://support.cloudflare.com/hc/en-us/articles/200170706-How-do-I-restore-original-visitor-IP-with-Nginx- Nginx Visitor IP Forward] - You can use the X-Forwarded-For header to obtain the real IP address. Only needs the Nginx Real IP module, which is included in Debian.
* [[Varnish/MediaWiki#SSL_Redirect_with_Nginx|Nginx]] (SSL Redirect and img.bibanon.org) - Nginx is used primarily as an SSL frontend to Varnish, since Varnish doesn't support SSL. It also serves static files from img.bibanon.org without the help of Varnish. We didn't bother to cache img.bibanon.org with Varnish, since with static files: Nginx and Varnish have similar performance, no need to add another layer.
* [[Varnish/MediaWiki|Varnish]] - This caching front server is used on Wikimedia sites to significantly reduce the amount of regeneration that dynamic pages need, while preventing outdated caches by having Mediawiki directly tell Varnish what needs to be regenerated.
* [[Nginx/MediaWiki|Nginx]] - Nginx also serves as a backend to Varnish on port 127.0.0.1:8080 (internal only), and proxies a PHP-FPM UNIX socket.
** [[PHP/HHVM|HHVM]] - Facebook's HipTop Virtual Machine significantly speeds up PHP code with just-in-time compilation. It's also what the Wikimedia Foundation uses.
** [[PHP/FPM/MediaWiki|PHP-FPM]] - Unlike Apache, Nginx isn't able to run PHP natively itself, so we use PHP-FPM here. It's a bit faster than normal PHP.
* '''[[MediaWiki/Installation]]''' - How we install MediaWiki itself.
* [[PostgreSQL/MediaWiki|PostgreSQL]] - Used as our database for a number of reasons, from stability to compatibility with other apps to support for JSONB values. However, it is clearly not the most popular choice of database for Mediawiki, so we do make some workarounds to support this unique use case. Unfortunately, facts are that Mediawiki was made for MySQL/MariaDB first, so we decided to move over.
* Memory Caching - A large amount of small transactions on the database can slow it down tremendously. Caching solutions can help by offloading quick transactions to RAM.
** [[Memcached/MediaWiki|Memcached]] - An alternative to the default APCu PHP caching system, and is designed to significantly lighten the load of queries on the database. Also, the [[mediawikiwiki:Extension:OAuth|OAuth]] extension requires memcached.
** [[Redis/MediaWiki|Redis]] - Redis is now used by the Wikimedia Foundation instead of Memcached, since it can also handle the job queue.
* [[MediaWiki/Moderation|Moderation]] - How to moderate on MediaWiki, as it can get covered in spam.


== Extensions ==
== Extensions ==
Line 29: Line 33:
=== Infrastructure ===
=== Infrastructure ===


* Cloudflare - Using cloudflare may require some special mods to Mediawiki or just the web server to get the actual client IPs transferred, which is crucial if you allow anonymous IP edit (so you ban the user, not the entire cloudflare server).
* [[mw:Extension:AWS|Amazon AWS]] - The tools needed to support AWS S3 upload, if you are using it. If you use this you should probably bundle it with Amazon Cloudfront, their load balancing service.
** [https://www.mediawiki.org/wiki/Manual:CloudFlare#Installing_mod_cloudflare_in_Apache Apache mod_cloudflare] - An extension used for transmitting the real ip using X-Forwarded-For.
* [[mw:Extension:MobileFrontend|MobileFrontend]] - A mobilefrontend just like the one on Wikipedia. Makes editing away from home much easier.
** [https://support.cloudflare.com/hc/en-us/articles/200170706-How-do-I-restore-original-visitor-IP-with-Nginx- Nginx Visitor IP Forward] - You can use the X-Forwarded-For header to obtain the real IP address. Only needs the Nginx Real IP module, which is included in Debian.
* [[mw:Extension:Translate|Translate]] - Very powerful translation tool used on most Wikimedia wikis to great effect.
* [[mediawikiwiki:Extension:AWS|Amazon AWS]] - The tools needed to support AWS S3 upload, if you are using it. If you use this you should probably bundle it with Amazon Cloudfront, their load balancing service.
* [[Mediawiki/Cargo|Cargo]] - Adds semantic metadata handling to MediaWiki, making it a very powerful semantic web database. Cargo also works as a simpler, better alternative to Semantic MediaWiki: because in practice metadata is stored only in infoboxes anyway.
* [[mediawikiwiki:Extension:MobileFrontend|MobileFrontend]] - A mobilefrontend just like the one on Wikipedia. Makes editing away from home much easier.
 
=== Lua Modules ===
 
Lua modules are a powerful and efficient alternative to the increasingly incomprehensible MediaWiki templating language. Because if it's going to be programmed anyway, might as well use a real programming language.
 
* [[mw:Extension:Scribunto|Scribunto]] - Provides Lua scripting for Turing-complete computation instead of using increasingly complex template scripting. Might be a little intimidating to install, but it's well worth it.
* [[mw:Extension:Capiunto|Capiunto]] - Easy and effective infoboxes for anyone.


=== Mods ===
=== Mods ===
Line 41: Line 51:
=== Spam ===
=== Spam ===


* [[mediawikiwiki:Extension:SpamBlacklist#Whitelist|SpamBlacklist]] - Comes with Mediawiki by default, and we've enabled it. However, it blocks a lot of good 4chan sources (naturally), so we've set up a [[Mediawiki:Spam-whitelist|whitelist]] as well.
* [[mw:Extension:SpamBlacklist#Whitelist|SpamBlacklist]] - Comes with Mediawiki by default, and we've enabled it. However, it blocks a lot of good 4chan sources (naturally), so we've set up a [[Mediawiki:Spam-whitelist|whitelist]] as well.


=== Media ===
=== Media ===


* [[mediawikiwiki:Extension:EmbedVideo|EmbedVideo]] - This embeds uploaded videos using the browser's own HTML5 <code><video></code> tag for embedding content (requires MP4 or webm). You can even embed from YouTube or NicoNico.
* [[mw:Extension:EImage|EImage]] - Embed external images as if they were normal MediaWiki images.
* [[mw:Extension:EmbedVideo|EmbedVideo]] - This embeds uploaded videos using the browser's own HTML5 <code><video></code> tag for embedding content (requires MP4 or webm). You can even embed from YouTube or NicoNico.
* [[mw:Extension:SimpleBatchUpload|SimpleBatchUpload]] - The easiest solution to uploading multiple files. Just select a folder or multiple files from the system file picker. No need to choose rights information or whatever.
<!--
<!--


We looked into whether we could use these extensions, but we couldn't get them working on our configuration.
We looked into whether we could use these extensions, but we couldn't get them working on our configuration.


* [[mediawikiwiki:Extension:TimedMediaHandler|TimedMediaHandler]] - Used for embedding MP4/WebM and ogg audio: but not for mp3s. The same popular extension from Mediawiki.
* [[mw:Extension:TimedMediaHandler|TimedMediaHandler]] - Used for embedding MP4/WebM and ogg audio: but not for mp3s. The same popular extension from Mediawiki.
** <s>The SQL [https://www.mediawiki.org/wiki/Topic:Rlt45cl0khrl8o15 just needs modification to work with PostgreSQL.]</s> Unfortunately some more significant changes in the extension are needed from there, but we might as well avoid it in that case.
** <s>The SQL [https://www.mediawiki.org/wiki/Topic:Rlt45cl0khrl8o15 just needs modification to work with PostgreSQL.]</s> Unfortunately some more significant changes in the extension are needed from there, but we might as well avoid it in that case.
* [[mediawikiwiki:Extension:HTML5video|HTML5Video]] - Very simple HTML5Video extension which embeds content such as MP3, MP4, or webm using JPlayer.
* [[mw:Extension:HTML5video|HTML5Video]] - Very simple HTML5Video extension which embeds content such as MP3, MP4, or webm using JPlayer.
** Downside is that it doesn't allow files to be uploaded normally through Mediawiki. Nope.
** Downside is that it doesn't allow files to be uploaded normally through Mediawiki. Nope.
-->
-->
Line 58: Line 70:
=== Security ===
=== Security ===


* [[mediawikiwiki:Extension:OATHAuth|OATHAuth]] - Uses TOTP one time codes along with your password for two factor authentication, in case one of them is compromised. You can run TOTP through Authy or Google Authenticator using any smartphone (or even dumbphone if it has Java applets). Well maintained since it is used by the Wikimedia Foundation for admin accounts. (not to be confused with OAUTH)
* [[mw:Extension:OATHAuth|OATHAuth]] - Uses TOTP one time codes along with your password for two factor authentication, in case one of them is compromised. You can run TOTP through Authy or Google Authenticator using any smartphone (or even dumbphone if it has Java applets). Well maintained since it is used by the Wikimedia Foundation for admin accounts. (not to be confused with OAUTH)
** [https://gerrit.wikimedia.org/r/#/c/135618/ Wikimedia Gerrit: 135618] - [https://phabricator.wikimedia.org/T67658 Wikimedia Phabricator  - T67658] - In the stable releases, OATHAuth only supports MySQL at the moment. However, Reedy has added PostgreSQL tables, so you need to grab the latest version straight from the git.  
** [https://gerrit.wikimedia.org/r/#/c/135618/ Wikimedia Gerrit: 135618] - [https://phabricator.wikimedia.org/T67658 Wikimedia Phabricator  - T67658] - In the stable releases, OATHAuth only supports MySQL at the moment. However, Reedy has added PostgreSQL tables, so you need to grab the latest version straight from the git.  
** Then, go to the page [[Special:Two-factor_authentication]] to activate TOTP. You can use an app such as Authy, Google Authenticator, Authomator (BB10), or any other TOTP app: perhaps even the hardware OnlyKey.
** Then, go to the page [[Special:Two-factor_authentication]] to activate TOTP. You can use an app such as Authy, Google Authenticator, Authomator (BB10), or any other TOTP app: perhaps even the hardware OnlyKey.
* [[mediawikiwiki:Extension:OAuth|OAuth]] - You can use an OAuth system so that you can use your own wiki accounts as a single login system (rather than many), just like you would link Google or Facebook accounts with OAuth. In particular, Mediawiki has the ability to activate two factor authentication with the extension above. Requires Memcached.
* [[mw:Extension:OAuth|OAuth]] - You can use an OAuth system so that you can use your own wiki accounts as a single login system (rather than many), just like you would link Google or Facebook accounts with OAuth. In particular, Mediawiki has the ability to activate two factor authentication with the extension above. Requires Memcached.
** This extension implements OAuth 1.0, which requires cryptography enabled on both ends. OAuth 2.0 doesn't require this, but it has tradeoffs as a result (though it can be overcome by restoring cryptographic plugins). Thus, it's not a question of which is better, but which would work for you. [https://codiscope.com/oauth-2-0-vs-oauth-1-0/ More details here.]
** This extension implements OAuth 1.0, which requires cryptography enabled on both ends. OAuth 2.0 doesn't require this, but it has tradeoffs as a result (though it can be overcome by restoring cryptographic plugins). Thus, it's not a question of which is better, but which would work for you. [https://codiscope.com/oauth-2-0-vs-oauth-1-0/ More details here.]
** While the extension currently has SQLite support, it doesn't have PostgreSQL support yet. But it's a simple matter of translating the syntax into the correct format, [https://github.com/wikimedia/mediawiki-extensions-OAuth/blob/master/backend/schema/MWOAuthUpdater.hooks.php in this directory.] Simple, if not easy. It might be possible to use the [https://gist.github.com/vigneshwaranr/3454093 SQLite to PostgreSQL conversion script.]
** While the extension currently has SQLite support, it doesn't have PostgreSQL support yet. But it's a simple matter of translating the syntax into the correct format, [https://github.com/wikimedia/mediawiki-extensions-OAuth/blob/master/backend/schema/MWOAuthUpdater.hooks.php in this directory.] Simple, if not easy. It might be possible to use the [https://gist.github.com/vigneshwaranr/3454093 SQLite to PostgreSQL conversion script.]
Line 70: Line 82:


* [http://www.mediawikiwidgets.org/SoundCloud SoundCloud] - Allows us to embed SoundCloud music for playing,
* [http://www.mediawikiwidgets.org/SoundCloud SoundCloud] - Allows us to embed SoundCloud music for playing,
== Installation Instructions ==
The Bibliotheca Anonoma Wiki has a unique installation process. Generally, you follow the guides here, but mix them both together.
=== General ===
Follow this guide first, but then the distribution specific ones for further guidance and dependencies.
https://www.mediawiki.org/wiki/Manual:Installing_MediaWiki
=== Debian ===
For Debian, although there exists a Mediawiki package in jessie-backports, it installs Apache and MySQL, which is not what we use. But if you're fine with that, go ahead.
https://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Debian_or_Ubuntu


== Wiki Backup ==
== Wiki Backup ==
Line 101: Line 97:
These XML dumps can then be imported through [[mediawikiwiki:Manual:Importing_XML_dumps|these procedures.]]
These XML dumps can then be imported through [[mediawikiwiki:Manual:Importing_XML_dumps|these procedures.]]


=== Image Backup ===
==== Automysqlbackup ====
 
Image backup can be easily done from our end, so we commit to doing so, that way you don't have to.
 
Use [[mediawikiwiki:Manual:ImportImages.php|ImportImages.php]] to dump them to a folder. Then 7zip them up into the Wikiteam format along with the XML.


== PostgreSQL ==
This script can make setting up cron for backing up all mysql databases much easier. You'll still have to upload the backups with another cron script though.


PostgreSQL is used as our database for a number of reasons, from stability to compatibility with other apps to support for JSONB values.  
Notice that you should exclude the {{ic|performance_schemas}} table from backup.


However, it is clearly not the most popular choice of database for Mediawiki, so we do make some workarounds to support this unique use case. These mods are noted below.
https://www.linux.com/learn/how-do-painless-mysql-server-backups-automysqlbackup


=== PostgreSQL with UNIX Sockets ===
=== Image Backup ===


As noted in the Mediawiki tutorial, you generally connect to the PostgreSQL over a TCP connection, and use <code>md5</code> password authentication.  
Image backup can be easily done from our end, so we commit to doing so, that way you don't have to.  


However, ''if the PostgreSQL database is on the same server'', it's a better idea to dispense with the TCP overhead and connect to the UNIX socket directly.
Use [[mediawikiwiki:Manual:ImportImages.php|ImportImages.php]] to dump them to a folder. Then 7zip them up into the Wikiteam format along with the XML.
 
First, make sure that you've set a password for the <code>postgres</code> superuser, so you can log into it without using <code>peer</code> authentication, which we are going to switch to <code>md5</code>.


<pre>
=== Automated Site Backup ===
$ sudo su # must become root to become postgres user first
# su postgres
$ psql
postgres=# \password
Enter new password:
Enter it again:
postgres=#
</pre>


Next, we need to enable <code>md5</code> authentication to the UNIX socket. On PostgreSQL 9.6 on Debian, edit the file <code>/etc/postgresql/9.6/main/pg_hba.conf</code> and change the following lines to match the below:
Since we have a unique configuration, it can be difficult to reconstruct if it is lost. This script backs up the site config and all images.


<pre>
Monthly full backup:
# "local" is for Unix domain socket connections only
local  all            all                                    md5
# IPv4 local connections:
host    all            all            127.0.0.1/32            md5
# IPv6 local connections:
host    all            all            ::1/128                md5
</pre>


On Debian, the PostgreSQL UNIX Socket is at <code>/var/run/postgresql/.s.PGSQL.5432</code>, so in LocalSettings.php set these following lines (make sure to comment out <code>$wgDBPort</code>, which is not needed)
{{bc|<nowiki>
55 11 1 * *  /usr/local/bin/fullmwbackup.sh
</nowiki>}}


<pre>
{{bc|<nowiki>
## Database settings
#!/bin/bash
$wgDBtype = "postgres";
#
$wgDBserver = "/var/run/postgresql/.s.PGSQL.5432"; # UNIX port path
# fullsitebackup.sh V1.2
#
# Full backup of website files and database content.
#
# A number of variables defining file location and database connection
# information must be set before this script will run.
# Files are tar'ed from the root directory of the website. All files are
# saved. The MySQL database tables are dumped without a database name and
# and with the option to drop and recreate the tables.
#
# ----------------------
# 05-Jul-2007 - Quick adaptation for MediaWiki (currently testing)
# ----------------------
# March 2007 Updates - Version for Drupal
# - Updated script to resolve minor path bug
# - Added mysql password variable (caution - this script file is now a security risk - protect it)
# - Generates temp log file
# - Updated backup and restore scripts have been tested on Ubunutu Edgy server w/Drupal 5.1
#
# - Enjoy! BristolGuy
#-----------------------
#
## Parameters:
# tar_file_name (optional)
#
#
# Configuration
#


# Postgres specific settings
# Database connection information
#$wgDBport = "5432"; # disable this
#dbname="wikidb" # (e.g.: dbname=wikidb)
</pre>
#dbhost="localhost"
#dbuser="" # (e.g.: dbuser=wikiuser)
#dbpw="" # (e.g.: dbuser password)


{{Note|Obviously if the PostgreSQL database is on another server you have to use TCP and not UNIX sockets to communicate with it, so just expose the port on that server and put the IP in <code>$wgDBServer</code> in LocalSettings.php.}}
# Website Files
webrootdir="/var/www/mediawiki" # (e.g.: webrootdir=/home/user/public_html)


== Activating Memcached ==
#
# Variables
#


Memcached is an alternative to the default APCu PHP caching system, and is designed to significantly lighten the load of queries on the database. Also, the [[mediawikiwiki:Extension:OAuth|OAuth]] extension requires memcached.
# Default TAR Output File Base Name
tarnamebase=sitebackup-
datestamp=`date +'%m-%d-%Y'`


https://www.mediawiki.org/wiki/Memcached#Setup
# Execution directory (script start point)
#startdir=`pwd`
startdir=/tmp
logfile=$startdir"/fullsite.log" # file path and name of log file to use


https://www.howtoforge.com/install-memcached-and-php5-memcached-module-on-debian-6.0-squeeze
# Where backups should be placed
enddir=/var/backup/mediawiki


== Serving files using a specific images subdomain ==
# Temporary Directory
tempdir=$datestamp


Nginx can be optimized to make image serving more efficient, and block hotlinking. Since the settings for static images often differ greatly from that of dynamic text, it is recommended that you create a specific subdomain just for images (such as <code>img.bibanon.org</code>) and serve your image folder from there.
#
# Input Parameter Check
#


Here is the Nginx config we used (without SSL), with our image folder under a custom dir set by <code></code>: <code>/storage/mw-img/</code>:
if test "$1" = ""
then
tarname=$tarnamebase$datestamp.tgz
else
tarname=$1
fi


<pre>
#
server {
# Begin logging
    listen 80;
#
    server_name img.bibanon.org;
echo "Beginning mediawiki site backup using fullsitebackup.sh ..." > $logfile
#
# Create temporary working directory
#
echo " Creating temp working dir ..." >> $logfile
cd $startdir
mkdir $tempdir


    # images stored here
#
    root /storage/mw-img/;
# TAR website files and /etc/mediawiki/LocalSettings.php
#
echo " TARing website files into $webrootdir ..." >> $logfile
cd $webrootdir
tar czf $enddir/$tarname.tar.gz /etc/mediawiki/LocalSettings.php .
#tar cf $startdir/$tempdir/filecontent.tar .


    # let's encrypt SSL dir
#
    location ~ /\.well-known {
# sqldump database information
        root /var/lib/letsencrypt;
#
    }
#echo " Dumping mediawiki database, using ..." >> $logfile
#echo " user:$dbuser; database:$dbname host:$dbhost " >> $logfile
#cd $startdir/$tempdir
#mysqldump --user=$dbuser --password=$dbpw --add-drop-table $dbname > dbcontent.sql


    location ^~ / {
#
        try_files $uri =404;
# Create final backup file
    }
#
#echo " Creating final compressed (tgz) TAR file: $tarname ..." >> $logfile
#tar czf $enddir/$tarname filecontent.tar
#tar czf $enddir/$tarname filecontent.tar dbcontent.sql


    location ^~ /thumb/ {
#
        try_files $uri =404;   
# Cleanup
    }
#
echo " Removing temp dir $tempdir ..." >> $logfile
cd $startdir
rm -r $tempdir


    # block unnecessary access
#
    location ^~ /lockdir/ { deny all; }
# Exit banner
    location ^~ /temp/ { deny all; }
#
    location ^~ /archive/ { deny all; }
endtime=`date`
echo "Backup completed $endtime, TAR file at $tarname. " >> $logfile


    # block image hotlinking, but not from search engines
</nowiki>}}
    valid_referers none blocked bibanon.org *.bibanon.org ~.google. ~.bing. ~.yahoo.;
    if ($invalid_referer) {
        return  403; # you can alternatively link to an small unsavory picture to be a douche, though it still takes a little bandwidth
    }
}
</pre>

Latest revision as of 02:42, 4 April 2023

The Bibliotheca Anonoma Wiki is configured quite uniquely for our needs. It uses an Nginx web server, PostgreSQL DB on an SSD, and images on a RAID for hosting, and has a load of interesting extensions.

  • Load Balancer DNS - Cloudflare
  • Caching Front Server - Varnish
  • Web Server - Nginx
  • PHP Engine - HHVM
  • Database - MariaDB
  • Lua Engine - LuaSandbox C module
  • Cache - memcached

Network Topology of the Bibliotheca Anonoma Wiki[edit]

Ordered in layers from front to back. Details on our implementation and instructions on how to replicate them are found in pages below.

  • Cloudflare - Cloudflare is a CDN, load balancer, and DDoS mitigation service: all for free.
    • Using cloudflare may require some special mods to Mediawiki or just the web server to get the actual client IPs transferred, which is crucial if you allow anonymous IP edit (so you ban the user, not the entire cloudflare server).
    • Apache mod_cloudflare - An extension used for transmitting the real ip using X-Forwarded-For.
    • Nginx Visitor IP Forward - You can use the X-Forwarded-For header to obtain the real IP address. Only needs the Nginx Real IP module, which is included in Debian.
  • Nginx (SSL Redirect and img.bibanon.org) - Nginx is used primarily as an SSL frontend to Varnish, since Varnish doesn't support SSL. It also serves static files from img.bibanon.org without the help of Varnish. We didn't bother to cache img.bibanon.org with Varnish, since with static files: Nginx and Varnish have similar performance, no need to add another layer.
  • Varnish - This caching front server is used on Wikimedia sites to significantly reduce the amount of regeneration that dynamic pages need, while preventing outdated caches by having Mediawiki directly tell Varnish what needs to be regenerated.
  • Nginx - Nginx also serves as a backend to Varnish on port 127.0.0.1:8080 (internal only), and proxies a PHP-FPM UNIX socket.
    • HHVM - Facebook's HipTop Virtual Machine significantly speeds up PHP code with just-in-time compilation. It's also what the Wikimedia Foundation uses.
    • PHP-FPM - Unlike Apache, Nginx isn't able to run PHP natively itself, so we use PHP-FPM here. It's a bit faster than normal PHP.
  • MediaWiki/Installation - How we install MediaWiki itself.
  • PostgreSQL - Used as our database for a number of reasons, from stability to compatibility with other apps to support for JSONB values. However, it is clearly not the most popular choice of database for Mediawiki, so we do make some workarounds to support this unique use case. Unfortunately, facts are that Mediawiki was made for MySQL/MariaDB first, so we decided to move over.
  • Memory Caching - A large amount of small transactions on the database can slow it down tremendously. Caching solutions can help by offloading quick transactions to RAM.
    • Memcached - An alternative to the default APCu PHP caching system, and is designed to significantly lighten the load of queries on the database. Also, the OAuth extension requires memcached.
    • Redis - Redis is now used by the Wikimedia Foundation instead of Memcached, since it can also handle the job queue.
  • Moderation - How to moderate on MediaWiki, as it can get covered in spam.

Extensions[edit]

Infrastructure[edit]

  • Amazon AWS - The tools needed to support AWS S3 upload, if you are using it. If you use this you should probably bundle it with Amazon Cloudfront, their load balancing service.
  • MobileFrontend - A mobilefrontend just like the one on Wikipedia. Makes editing away from home much easier.
  • Translate - Very powerful translation tool used on most Wikimedia wikis to great effect.
  • Cargo - Adds semantic metadata handling to MediaWiki, making it a very powerful semantic web database. Cargo also works as a simpler, better alternative to Semantic MediaWiki: because in practice metadata is stored only in infoboxes anyway.

Lua Modules[edit]

Lua modules are a powerful and efficient alternative to the increasingly incomprehensible MediaWiki templating language. Because if it's going to be programmed anyway, might as well use a real programming language.

  • Scribunto - Provides Lua scripting for Turing-complete computation instead of using increasingly complex template scripting. Might be a little intimidating to install, but it's well worth it.
  • Capiunto - Easy and effective infoboxes for anyone.

Mods[edit]

  • Anonymous IP Hash - Halcy developed a mod for MediaWiki on tanasinn that hashes ips of anonymous users much like on 4chan's /b/ or 2channel.

Spam[edit]

  • SpamBlacklist - Comes with Mediawiki by default, and we've enabled it. However, it blocks a lot of good 4chan sources (naturally), so we've set up a whitelist as well.

Media[edit]

  • EImage - Embed external images as if they were normal MediaWiki images.
  • EmbedVideo - This embeds uploaded videos using the browser's own HTML5 <video> tag for embedding content (requires MP4 or webm). You can even embed from YouTube or NicoNico.
  • SimpleBatchUpload - The easiest solution to uploading multiple files. Just select a folder or multiple files from the system file picker. No need to choose rights information or whatever.

Security[edit]

  • OATHAuth - Uses TOTP one time codes along with your password for two factor authentication, in case one of them is compromised. You can run TOTP through Authy or Google Authenticator using any smartphone (or even dumbphone if it has Java applets). Well maintained since it is used by the Wikimedia Foundation for admin accounts. (not to be confused with OAUTH)
    • Wikimedia Gerrit: 135618 - Wikimedia Phabricator - T67658 - In the stable releases, OATHAuth only supports MySQL at the moment. However, Reedy has added PostgreSQL tables, so you need to grab the latest version straight from the git.
    • Then, go to the page Special:Two-factor_authentication to activate TOTP. You can use an app such as Authy, Google Authenticator, Authomator (BB10), or any other TOTP app: perhaps even the hardware OnlyKey.
  • OAuth - You can use an OAuth system so that you can use your own wiki accounts as a single login system (rather than many), just like you would link Google or Facebook accounts with OAuth. In particular, Mediawiki has the ability to activate two factor authentication with the extension above. Requires Memcached.
    • This extension implements OAuth 1.0, which requires cryptography enabled on both ends. OAuth 2.0 doesn't require this, but it has tradeoffs as a result (though it can be overcome by restoring cryptographic plugins). Thus, it's not a question of which is better, but which would work for you. More details here.
    • While the extension currently has SQLite support, it doesn't have PostgreSQL support yet. But it's a simple matter of translating the syntax into the correct format, in this directory. Simple, if not easy. It might be possible to use the SQLite to PostgreSQL conversion script.

Widgets[edit]

Widgets are little bits of HTML which can be used as advanced templates.

  • SoundCloud - Allows us to embed SoundCloud music for playing,

Wiki Backup[edit]

Even the archivists must back themselves up periodically, especially on such a crucial wiki. But if we fall behind, you can also run the WikiTeam scripts to generate a full text and full image backup.

Text Backup[edit]

In the case of our wiki, database dumps are only done for internal use because they are specific to a certain version of Mediawiki, our unique extensions, and contain sensitive data such as password hashes. It may not even be helpful to our successors, since we use PostgreSQL and MySQL/MariaDB may be easier to set up with Mediawiki.

Instead, we provide XML dumps which are version independent and free to all, and are periodically uploaded to the Internet Archive. These can also be made by the general public via Special:Export, which is what the WikiTeam scripts do.

Use DumpBackup.php to create XML dumps on the server itself. Then 7zip them up.

These XML dumps can then be imported through these procedures.

Automysqlbackup[edit]

This script can make setting up cron for backing up all mysql databases much easier. You'll still have to upload the backups with another cron script though.

Notice that you should exclude the performance_schemas table from backup.

https://www.linux.com/learn/how-do-painless-mysql-server-backups-automysqlbackup

Image Backup[edit]

Image backup can be easily done from our end, so we commit to doing so, that way you don't have to.

Use ImportImages.php to dump them to a folder. Then 7zip them up into the Wikiteam format along with the XML.

Automated Site Backup[edit]

Since we have a unique configuration, it can be difficult to reconstruct if it is lost. This script backs up the site config and all images.

Monthly full backup:

55 11 1 * *  /usr/local/bin/fullmwbackup.sh
#!/bin/bash
#
# fullsitebackup.sh V1.2
#
# Full backup of website files and database content.
#
# A number of variables defining file location and database connection
# information must be set before this script will run.
# Files are tar'ed from the root directory of the website. All files are
# saved. The MySQL database tables are dumped without a database name and
# and with the option to drop and recreate the tables.
#
# ----------------------
# 05-Jul-2007 - Quick adaptation for MediaWiki (currently testing)
# ----------------------
# March 2007 Updates - Version for Drupal
# - Updated script to resolve minor path bug
# - Added mysql password variable (caution - this script file is now a security risk - protect it)
# - Generates temp log file
# - Updated backup and restore scripts have been tested on Ubunutu Edgy server w/Drupal 5.1
#
# - Enjoy! BristolGuy
#-----------------------
#
## Parameters:
# tar_file_name (optional)
#
#
# Configuration
#

# Database connection information
#dbname="wikidb" # (e.g.: dbname=wikidb)
#dbhost="localhost"
#dbuser="" # (e.g.: dbuser=wikiuser)
#dbpw="" # (e.g.: dbuser password)

# Website Files
webrootdir="/var/www/mediawiki" # (e.g.: webrootdir=/home/user/public_html)

#
# Variables
#

# Default TAR Output File Base Name
tarnamebase=sitebackup-
datestamp=`date +'%m-%d-%Y'`

# Execution directory (script start point)
#startdir=`pwd`
startdir=/tmp
logfile=$startdir"/fullsite.log" # file path and name of log file to use

# Where backups should be placed
enddir=/var/backup/mediawiki

# Temporary Directory
tempdir=$datestamp

#
# Input Parameter Check
#

if test "$1" = ""
then
tarname=$tarnamebase$datestamp.tgz
else
tarname=$1
fi

#
# Begin logging
#
echo "Beginning mediawiki site backup using fullsitebackup.sh ..." &gt; $logfile
#
# Create temporary working directory
#
echo " Creating temp working dir ..." &gt;&gt; $logfile
cd $startdir
mkdir $tempdir

#
# TAR website files and /etc/mediawiki/LocalSettings.php
#
echo " TARing website files into $webrootdir ..." &gt;&gt; $logfile
cd $webrootdir
tar czf $enddir/$tarname.tar.gz /etc/mediawiki/LocalSettings.php . 
#tar cf $startdir/$tempdir/filecontent.tar .

#
# sqldump database information
#
#echo " Dumping mediawiki database, using ..." &gt;&gt; $logfile
#echo " user:$dbuser; database:$dbname host:$dbhost " &gt;&gt; $logfile
#cd $startdir/$tempdir
#mysqldump --user=$dbuser --password=$dbpw --add-drop-table $dbname &gt; dbcontent.sql

#
# Create final backup file
#
#echo " Creating final compressed (tgz) TAR file: $tarname ..." &gt;&gt; $logfile
#tar czf $enddir/$tarname filecontent.tar
#tar czf $enddir/$tarname filecontent.tar dbcontent.sql

#
# Cleanup
#
echo " Removing temp dir $tempdir ..." &gt;&gt; $logfile
cd $startdir
rm -r $tempdir

#
# Exit banner
#
endtime=`date`
echo "Backup completed $endtime, TAR file at $tarname. " &gt;&gt; $logfile