MediaWiki: Difference between revisions

From Bibliotheca Anonoma
(→‎PostgreSQL: moved section to subpage)
Line 123: Line 123:


https://www.howtoforge.com/install-memcached-and-php5-memcached-module-on-debian-6.0-squeeze
https://www.howtoforge.com/install-memcached-and-php5-memcached-module-on-debian-6.0-squeeze
== Serving files using a specific images subdomain ==
Nginx can be optimized to make image serving more efficient, and block hotlinking. Since the settings for static images often differ greatly from that of dynamic text, it is recommended that you create a specific subdomain just for images (such as <code>img.bibanon.org</code>) and serve your image folder from there.
Here is the Nginx config we used (without SSL), with our image folder under a custom dir set by <code></code>: <code>/storage/mw-img/</code>:
<pre>
server {
    listen 80;
    server_name img.bibanon.org;
    # images stored here
    root /storage/mw-img/;
    # let's encrypt SSL dir
    location ~ /\.well-known {
        root /var/lib/letsencrypt;
    }
    location ^~ / {
        try_files $uri =404;
    }
    location ^~ /thumb/ {
        try_files $uri =404;   
    }
    # block unnecessary access
    location ^~ /lockdir/ { deny all; }
    location ^~ /temp/ { deny all; }
    location ^~ /archive/ { deny all; }
    # block image hotlinking, but not from search engines
    valid_referers none blocked bibanon.org *.bibanon.org ~.google. ~.bing. ~.yahoo.;
    if ($invalid_referer) {
        return  403; # you can alternatively link to an small unsavory picture to be a douche, though it still takes a little bandwidth
    }
}
</pre>

Revision as of 22:29, 15 December 2016

The Bibliotheca Anonoma Wiki is configured quite uniquely for our needs. It uses an Nginx web server, PostgreSQL DB on an SSD, and images on a RAID for hosting, and has a load of interesting extensions.

  • Load Balancer DNS - Cloudflare
  • Caching Front Server - Varnish
  • Web Server - Nginx
  • PHP Engine - PHP FPM
  • Database - PostgreSQL
  • Cache - memcached

To Do

  • Enable Anonymous Hash IDs like Tanasinn.info
  • Activate Varnish Caching
  • Enable Cloudflare
  • Upgrade to Mediawiki 1.28
  • Enable PostgreSQL UNIX Socket on PHP-FPM for higher performance
  • Fix Logo for multiple sizes (displays weird on some computers, maybe make it match the Mediawiki default dimensions?)
  • Activate img.bibanon.org for finer tuned image management and serving
    • Prevent hotlinking on images from external sites to preserve bandwidth
  • Activate the best job queue system you can
  • Activate Memcached

Network Topology of the Bibliotheca Anonoma Wiki

Ordered in layers from front to back. Details on our implementation and instructions on how to replicate them are found in pages below.

  • Cloudflare - Cloudflare is a CDN, load balancer, and DDoS mitigation service: all for free.
    • Using cloudflare may require some special mods to Mediawiki or just the web server to get the actual client IPs transferred, which is crucial if you allow anonymous IP edit (so you ban the user, not the entire cloudflare server).
    • Apache mod_cloudflare - An extension used for transmitting the real ip using X-Forwarded-For.
    • Nginx Visitor IP Forward - You can use the X-Forwarded-For header to obtain the real IP address. Only needs the Nginx Real IP module, which is included in Debian.
  • Nginx (SSL Redirect and img.bibanon.org) - Nginx is used primarily as an SSL frontend to Varnish, since Varnish doesn't support SSL. It also serves static files from img.bibanon.org without the help of Varnish. We didn't bother to cache img.bibanon.org with Varnish, since with static files: Nginx and Varnish have similar performance, no need to add another layer.
  • Varnish - This caching front server is used on Wikimedia sites to significantly reduce the amount of regeneration that dynamic pages need, while preventing outdated caches by having Mediawiki directly tell Varnish what needs to be regenerated.
  • Nginx - Nginx also serves as a backend to Varnish on port 127.0.0.1:8080 (internal only), and proxies a PHP-FPM UNIX socket.
  • PHP-FPM - Unlike Apache, Nginx isn't able to run PHP natively itself, so we use PHP-FPM here. It's actually a bit faster in general.
  • MediaWiki/Installation - How we install MediaWiki itself.
  • PostgreSQL - Used as our database for a number of reasons, from stability to compatibility with other apps to support for JSONB values. However, it is clearly not the most popular choice of database for Mediawiki, so we do make some workarounds to support this unique use case.
  • Memcached - An alternative to the default APCu PHP caching system, and is designed to significantly lighten the load of queries on the database. Also, the OAuth extension requires memcached.

Extensions

Infrastructure

  • Amazon AWS - The tools needed to support AWS S3 upload, if you are using it. If you use this you should probably bundle it with Amazon Cloudfront, their load balancing service.
  • MobileFrontend - A mobilefrontend just like the one on Wikipedia. Makes editing away from home much easier.

Mods

  • Anonymous IP Hash - Halcy developed a mod for MediaWiki on tanasinn that hashes ips of anonymous users much like on 4chan's /b/ or 2channel.

Spam

  • SpamBlacklist - Comes with Mediawiki by default, and we've enabled it. However, it blocks a lot of good 4chan sources (naturally), so we've set up a whitelist as well.

Media

  • EmbedVideo - This embeds uploaded videos using the browser's own HTML5 <video> tag for embedding content (requires MP4 or webm). You can even embed from YouTube or NicoNico.

Security

  • OATHAuth - Uses TOTP one time codes along with your password for two factor authentication, in case one of them is compromised. You can run TOTP through Authy or Google Authenticator using any smartphone (or even dumbphone if it has Java applets). Well maintained since it is used by the Wikimedia Foundation for admin accounts. (not to be confused with OAUTH)
    • Wikimedia Gerrit: 135618 - Wikimedia Phabricator - T67658 - In the stable releases, OATHAuth only supports MySQL at the moment. However, Reedy has added PostgreSQL tables, so you need to grab the latest version straight from the git.
    • Then, go to the page Special:Two-factor_authentication to activate TOTP. You can use an app such as Authy, Google Authenticator, Authomator (BB10), or any other TOTP app: perhaps even the hardware OnlyKey.
  • OAuth - You can use an OAuth system so that you can use your own wiki accounts as a single login system (rather than many), just like you would link Google or Facebook accounts with OAuth. In particular, Mediawiki has the ability to activate two factor authentication with the extension above. Requires Memcached.
    • This extension implements OAuth 1.0, which requires cryptography enabled on both ends. OAuth 2.0 doesn't require this, but it has tradeoffs as a result (though it can be overcome by restoring cryptographic plugins). Thus, it's not a question of which is better, but which would work for you. More details here.
    • While the extension currently has SQLite support, it doesn't have PostgreSQL support yet. But it's a simple matter of translating the syntax into the correct format, in this directory. Simple, if not easy. It might be possible to use the SQLite to PostgreSQL conversion script.

Widgets

Widgets are little bits of HTML which can be used as advanced templates.

  • SoundCloud - Allows us to embed SoundCloud music for playing,

Installation Instructions

The Bibliotheca Anonoma Wiki has a unique installation process. Generally, you follow the guides here, but mix them both together.

General

Follow this guide first, but then the distribution specific ones for further guidance and dependencies.

https://www.mediawiki.org/wiki/Manual:Installing_MediaWiki

Debian

For Debian, although there exists a Mediawiki package in jessie-backports, it installs Apache and MySQL, which is not what we use. But if you're fine with that, go ahead.

https://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Debian_or_Ubuntu

Wiki Backup

Even the archivists must back themselves up periodically, especially on such a crucial wiki. But if we fall behind, you can also run the WikiTeam scripts to generate a full text and full image backup.

Text Backup

In the case of our wiki, database dumps are only done for internal use because they are specific to a certain version of Mediawiki, our unique extensions, and contain sensitive data such as password hashes. It may not even be helpful to our successors, since we use PostgreSQL and MySQL/MariaDB may be easier to set up with Mediawiki.

Instead, we provide XML dumps which are version independent and free to all, and are periodically uploaded to the Internet Archive. These can also be made by the general public via Special:Export, which is what the WikiTeam scripts do.

Use DumpBackup.php to create XML dumps on the server itself. Then 7zip them up.

These XML dumps can then be imported through these procedures.

Image Backup

Image backup can be easily done from our end, so we commit to doing so, that way you don't have to.

Use ImportImages.php to dump them to a folder. Then 7zip them up into the Wikiteam format along with the XML.

Activating Memcached

Memcached is an alternative to the default APCu PHP caching system, and is designed to significantly lighten the load of queries on the database. Also, the OAuth extension requires memcached.

https://www.mediawiki.org/wiki/Memcached#Setup

https://www.howtoforge.com/install-memcached-and-php5-memcached-module-on-debian-6.0-squeeze