Page MenuHomePhabricator

Make MediaWiki run on Google's App Engine (tracking)
Open, LowPublic

Description

From the looks of it Google App Engine now supports PHP.
https://developers.google.com/appengine/

An interesting project now might be to get MediaWiki running optimally on it. (Ok well, I at least find it interesting even if I never have time to actually dedicate for this kind of stuff.)

I expect there will be three components involved in this:

  • An extension (maybe Extension:GAE to avoid name issues) implementing most of the big stuff needed.
  • Perhaps an mw-config/overrides.php file to make the installer more intuitive on GAE. Likely bundled with the GAE extension and then required from mw-config/overrides.php by whoever is installing.
  • And a tutorial/manual on how to set it all up and do the App Engine project configuration that can't be done directly in the extension.

Some specifics about the App Engine's environment:

  • PHP 5.4 (yay!)
  • Some HTTP headers in the request and response are modified/removed but it shouldn't create any problems. (eg: Gzip is handled by the App Engine itself and so Accepts-Encoding is stripped)
  • No filesystem write
  • No direct socket opening
  • Requests are not allowed to run for longer than 60s
  • Various system calls are not allowed
  • Some notable things come installed:
    • Various client libraries for services running on Google's App Engine and relevant Google services.
    • Basic stuff: dom, hash, json, libxml, mcrypt, mbstrinug, openssl, session, SPL, xml*, zlib
    • APC
    • memcached, memcached; Both of these are configured "under the hood" to automatically use the App Engine's memcache service and ignore configuration.
    • mysql, mysqli, mysqlnd
  • The default php session handler uses Memcache
  • A MySQL based database can be provided by the Google Cloud SQL service. Once permitted an app can access it over a unix socket.

This means a few things:

  • ;) the GAE extension gets to use [ 'arrays', 'like', 'this' ]
  • Ext: We'll need a file repo for Google's Cloud Storage to do uploads there.
    • Google Cloud Storage can do image handling natively so we can skip doing any thumbnail handling on our own besides making ...\CloudStorageTools::getImageServingUrl calls to get thumbnail urls.
  • Ext: We'll likely need an email handler to send emails through the App Engine's mail system.
  • We're still mostly fine for fopen stuff; http:// and https:// have stream wrappers that make them automatically use the App Engine's URL fetch service which permits requests to ports 80-90, 440-450, 1024-65535.
  • We can't log to file. However using PHP's native syslog will trigger the Log API that'll log messages that'll be available on the App Engine's console.
  • It's optional but there is a built-in handling for Google users that could easily be used to create an optional mode where logins are done with Google accounts and App admins are automatically made admin on the wiki.
  • Ext: We'll probably want to kill the normal job queue and find a way to implement it using the App Engine's Task Queues api.
    • It is possible to define a cron job that would run runJobs. But Task Queues are much more intelligent. The service works along with the App Engine's scaling. If there are no web requests and no tasks the app can theoretically be scaled down to the point that there are no instances running and costs go down. While cron would regularly warmup a new instance.
    • We'll want to be wary of extensions that add tasks which have job queue items endlessly add themselves back to the job queue to do cron like things.
  • Ext or overrides: Caches should be auto-configured to use Memcache but config should be ignored/omitted since it's not used.
  • We should see which works better; the App Engine's native Memcache sessions or our Memcache sessions.

The app.yaml contains a handlers: list defining what requests go where. While it's required to simply define basic functionality. We can also use it to enable short URLs.
Much of the app.yaml configuration will probably have to be done in a tutorial. But an includes: list to include some config from other .yaml files is supported so we could use that to bundle some of the config inside the GAE extension.

Monitoring our use of SQL queries in simple requests would be a good idea. If we can find a way to serve simple requests entirely from things like Memcache, Google Cloud Datastore, and Google Cloud Storage without making a single sql query (ie: Eliminating the cases when practically the entire request is SQL-less but we also make one pointless little SQL query in it that completely messes that up) then we can make is possible for a wiki to run on the App Engine for long periods of time serving bunches of readers while the Cloud SQL instances stay off nearly the entire time.

It's a side topic. But Google's App Engine's Mail service actually supports receiving emails. You can get HTTP POSTs for received emails and even bounces. May be useful to hook into for any "Email to wiki" extension or feature warning about user email delivery failure.


Version: unspecified
Severity: enhancement

Details

Reference
bz55475

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:17 AM
bzimport set Reference to bz55475.
bzimport added a subscriber: Unknown Object (MLST).

I started working on some of this.

(In reply to Daniel Friesen from comment #0)

  • Ext: We'll likely need an email handler to send emails through the App

Engine's mail system.

Created a handler for the AlternateUserMailer hook (https://github.com/wikimedia/mediawiki-extensions-GoogleAppEngine/blob/master/Hooks.php), but it could definitely be made more robust.

  • Ext: We'll probably want to kill the normal job queue and find a way to

implement it using the App Engine's Task Queues api.

  • It is possible to define a cron job that would run runJobs. But Task

Queues are much more intelligent. The service works along with the App
Engine's scaling. If there are no web requests and no tasks the app can
theoretically be scaled down to the point that there are no instances
running and costs go down. While cron would regularly warmup a new instance.

  • We'll want to be wary of extensions that add tasks which have job queue

items endlessly add themselves back to the job queue to do cron like things.

I did a basic implementation of a JobQueueGAE class which does this: https://github.com/wikimedia/mediawiki-extensions-GoogleAppEngine/blob/master/job/JobQueueGAE.php

Not sure what can really be done about the extensions that use the job queue as a cron-like system.

  • Ext or overrides: Caches should be auto-configured to use Memcache but

config should be ignored/omitted since it's not used.

MediaWiki's PHP memcache class doesn't work, you have to use the PECL version, but that was extremely simple:

$wgObjectCaches[CACHE_MEMCACHED] = array( 'class' => 'MemcachedPeclBagOStuff' );

GAE automatically discards any configuration you try to give it, so nothing needs to be done for that.

Note that when using Memcached though, you'll get hit with https://code.google.com/p/googleappengine/issues/detail?id=10775.

fbstj awarded a token.Dec 18 2014, 9:45 PM
Addshore added a subscriber: Addshore.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 24 2015, 11:24 AM
Qgil removed a subscriber: Qgil.Jul 29 2016, 7:47 AM

[not a Tracking-Neverending bug per definition and no subtasks at all; maybe Epic or such was meant?]