MediaWiki

From Bibliotheca Anonoma
Jump to: navigation, search

The Bibliotheca Anonoma Wiki is configured quite uniquely for our needs. It uses an Nginx web server, PostgreSQL DB on an SSD, and images on a RAID for hosting, and has a load of interesting extensions.

  • Load Balancer DNS - Cloudflare
  • Caching Front Server - Varnish
  • Web Server - Nginx
  • PHP Engine - HHVM
  • Database - MariaDB
  • Lua Engine - LuaSandbox C module
  • Cache - memcached

Network Topology of the Bibliotheca Anonoma Wiki

Ordered in layers from front to back. Details on our implementation and instructions on how to replicate them are found in pages below.

  • Cloudflare - Cloudflare is a CDN, load balancer, and DDoS mitigation service: all for free.
    • Using cloudflare may require some special mods to Mediawiki or just the web server to get the actual client IPs transferred, which is crucial if you allow anonymous IP edit (so you ban the user, not the entire cloudflare server).
    • Apache mod_cloudflare - An extension used for transmitting the real ip using X-Forwarded-For.
    • Nginx Visitor IP Forward - You can use the X-Forwarded-For header to obtain the real IP address. Only needs the Nginx Real IP module, which is included in Debian.
  • Nginx (SSL Redirect and img.bibanon.org) - Nginx is used primarily as an SSL frontend to Varnish, since Varnish doesn't support SSL. It also serves static files from img.bibanon.org without the help of Varnish. We didn't bother to cache img.bibanon.org with Varnish, since with static files: Nginx and Varnish have similar performance, no need to add another layer.
  • Varnish - This caching front server is used on Wikimedia sites to significantly reduce the amount of regeneration that dynamic pages need, while preventing outdated caches by having Mediawiki directly tell Varnish what needs to be regenerated.
  • Nginx - Nginx also serves as a backend to Varnish on port 127.0.0.1:8080 (internal only), and proxies a PHP-FPM UNIX socket.
    • HHVM - Facebook's HipTop Virtual Machine significantly speeds up PHP code with just-in-time compilation. It's also what the Wikimedia Foundation uses.
    • PHP-FPM - Unlike Apache, Nginx isn't able to run PHP natively itself, so we use PHP-FPM here. It's a bit faster than normal PHP.
  • MediaWiki/Installation - How we install MediaWiki itself.
  • PostgreSQL - Used as our database for a number of reasons, from stability to compatibility with other apps to support for JSONB values. However, it is clearly not the most popular choice of database for Mediawiki, so we do make some workarounds to support this unique use case. Unfortunately, facts are that Mediawiki was made for MySQL/MariaDB first, so we decided to move over.
  • Memory Caching - A large amount of small transactions on the database can slow it down tremendously. Caching solutions can help by offloading quick transactions to RAM.
    • Memcached - An alternative to the default APCu PHP caching system, and is designed to significantly lighten the load of queries on the database. Also, the OAuth extension requires memcached.
    • Redis - Redis is now used by the Wikimedia Foundation instead of Memcached, since it can also handle the job queue.

Extensions

Infrastructure

  • Amazon AWS - The tools needed to support AWS S3 upload, if you are using it. If you use this you should probably bundle it with Amazon Cloudfront, their load balancing service.
  • MobileFrontend - A mobilefrontend just like the one on Wikipedia. Makes editing away from home much easier.
  • Translate - Very powerful translation tool used on most Wikimedia wikis to great effect.
  • Cargo - Adds semantic metadata handling to MediaWiki, making it a very powerful semantic web database. Cargo also works as a simpler, better alternative to Semantic MediaWiki: because in practice metadata is stored only in infoboxes anyway.

Lua Modules

Lua modules are a powerful and efficient alternative to the increasingly incomprehensible MediaWiki templating language. Because if it's going to be programmed anyway, might as well use a real programming language.

  • Scribunto - Provides Lua scripting for Turing-complete computation instead of using increasingly complex template scripting. Might be a little intimidating to install, but it's well worth it.
  • Capiunto - Easy and effective infoboxes for anyone.

Mods

  • Anonymous IP Hash - Halcy developed a mod for MediaWiki on tanasinn that hashes ips of anonymous users much like on 4chan's /b/ or 2channel.

Spam

  • SpamBlacklist - Comes with Mediawiki by default, and we've enabled it. However, it blocks a lot of good 4chan sources (naturally), so we've set up a whitelist as well.

Media

  • EImage - Embed external images as if they were normal MediaWiki images.
  • EmbedVideo - This embeds uploaded videos using the browser's own HTML5 <video> tag for embedding content (requires MP4 or webm). You can even embed from YouTube or NicoNico.
  • SimpleBatchUpload - The easiest solution to uploading multiple files. Just select a folder or multiple files from the system file picker. No need to choose rights information or whatever.

Security

  • OATHAuth - Uses TOTP one time codes along with your password for two factor authentication, in case one of them is compromised. You can run TOTP through Authy or Google Authenticator using any smartphone (or even dumbphone if it has Java applets). Well maintained since it is used by the Wikimedia Foundation for admin accounts. (not to be confused with OAUTH)
    • Wikimedia Gerrit: 135618 - Wikimedia Phabricator - T67658 - In the stable releases, OATHAuth only supports MySQL at the moment. However, Reedy has added PostgreSQL tables, so you need to grab the latest version straight from the git.
    • Then, go to the page Special:Two-factor_authentication to activate TOTP. You can use an app such as Authy, Google Authenticator, Authomator (BB10), or any other TOTP app: perhaps even the hardware OnlyKey.
  • OAuth - You can use an OAuth system so that you can use your own wiki accounts as a single login system (rather than many), just like you would link Google or Facebook accounts with OAuth. In particular, Mediawiki has the ability to activate two factor authentication with the extension above. Requires Memcached.
    • This extension implements OAuth 1.0, which requires cryptography enabled on both ends. OAuth 2.0 doesn't require this, but it has tradeoffs as a result (though it can be overcome by restoring cryptographic plugins). Thus, it's not a question of which is better, but which would work for you. More details here.
    • While the extension currently has SQLite support, it doesn't have PostgreSQL support yet. But it's a simple matter of translating the syntax into the correct format, in this directory. Simple, if not easy. It might be possible to use the SQLite to PostgreSQL conversion script.

Widgets

Widgets are little bits of HTML which can be used as advanced templates.

  • SoundCloud - Allows us to embed SoundCloud music for playing,

Wiki Backup

Even the archivists must back themselves up periodically, especially on such a crucial wiki. But if we fall behind, you can also run the WikiTeam scripts to generate a full text and full image backup.

Text Backup

In the case of our wiki, database dumps are only done for internal use because they are specific to a certain version of Mediawiki, our unique extensions, and contain sensitive data such as password hashes. It may not even be helpful to our successors, since we use PostgreSQL and MySQL/MariaDB may be easier to set up with Mediawiki.

Instead, we provide XML dumps which are version independent and free to all, and are periodically uploaded to the Internet Archive. These can also be made by the general public via Special:Export, which is what the WikiTeam scripts do.

Use DumpBackup.php to create XML dumps on the server itself. Then 7zip them up.

These XML dumps can then be imported through these procedures.

Automysqlbackup

This script can make setting up cron for backing up all mysql databases much easier. You'll still have to upload the backups with another cron script though.

Notice that you should exclude the performance_schemas table from backup.

https://www.linux.com/learn/how-do-painless-mysql-server-backups-automysqlbackup

Image Backup

Image backup can be easily done from our end, so we commit to doing so, that way you don't have to.

Use ImportImages.php to dump them to a folder. Then 7zip them up into the Wikiteam format along with the XML.

Automated Site Backup

Since we have a unique configuration, it can be difficult to reconstruct if it is lost. This script backs up the site config and all images.

Monthly full backup:

55 11 1 * *  /usr/local/bin/fullmwbackup.sh
#!/bin/bash
#
# fullsitebackup.sh V1.2
#
# Full backup of website files and database content.
#
# A number of variables defining file location and database connection
# information must be set before this script will run.
# Files are tar'ed from the root directory of the website. All files are
# saved. The MySQL database tables are dumped without a database name and
# and with the option to drop and recreate the tables.
#
# ----------------------
# 05-Jul-2007 - Quick adaptation for MediaWiki (currently testing)
# ----------------------
# March 2007 Updates - Version for Drupal
# - Updated script to resolve minor path bug
# - Added mysql password variable (caution - this script file is now a security risk - protect it)
# - Generates temp log file
# - Updated backup and restore scripts have been tested on Ubunutu Edgy server w/Drupal 5.1
#
# - Enjoy! BristolGuy
#-----------------------
#
## Parameters:
# tar_file_name (optional)
#
#
# Configuration
#

# Database connection information
#dbname="wikidb" # (e.g.: dbname=wikidb)
#dbhost="localhost"
#dbuser="" # (e.g.: dbuser=wikiuser)
#dbpw="" # (e.g.: dbuser password)

# Website Files
webrootdir="/var/www/mediawiki" # (e.g.: webrootdir=/home/user/public_html)

#
# Variables
#

# Default TAR Output File Base Name
tarnamebase=sitebackup-
datestamp=`date +'%m-%d-%Y'`

# Execution directory (script start point)
#startdir=`pwd`
startdir=/tmp
logfile=$startdir"/fullsite.log" # file path and name of log file to use

# Where backups should be placed
enddir=/var/backup/mediawiki

# Temporary Directory
tempdir=$datestamp

#
# Input Parameter Check
#

if test "$1" = ""
then
tarname=$tarnamebase$datestamp.tgz
else
tarname=$1
fi

#
# Begin logging
#
echo "Beginning mediawiki site backup using fullsitebackup.sh ..." &gt; $logfile
#
# Create temporary working directory
#
echo " Creating temp working dir ..." &gt;&gt; $logfile
cd $startdir
mkdir $tempdir

#
# TAR website files and /etc/mediawiki/LocalSettings.php
#
echo " TARing website files into $webrootdir ..." &gt;&gt; $logfile
cd $webrootdir
tar czf $enddir/$tarname.tar.gz /etc/mediawiki/LocalSettings.php . 
#tar cf $startdir/$tempdir/filecontent.tar .

#
# sqldump database information
#
#echo " Dumping mediawiki database, using ..." &gt;&gt; $logfile
#echo " user:$dbuser; database:$dbname host:$dbhost " &gt;&gt; $logfile
#cd $startdir/$tempdir
#mysqldump --user=$dbuser --password=$dbpw --add-drop-table $dbname &gt; dbcontent.sql

#
# Create final backup file
#
#echo " Creating final compressed (tgz) TAR file: $tarname ..." &gt;&gt; $logfile
#tar czf $enddir/$tarname filecontent.tar
#tar czf $enddir/$tarname filecontent.tar dbcontent.sql

#
# Cleanup
#
echo " Removing temp dir $tempdir ..." &gt;&gt; $logfile
cd $startdir
rm -r $tempdir

#
# Exit banner
#
endtime=`date`
echo "Backup completed $endtime, TAR file at $tarname. " &gt;&gt; $logfile