Page MenuHomePhabricator

Import live wiki pages into MediaWiki-Vagrant
Closed, ResolvedPublic

Description

For testing various features in MediaWiki-Vagrant, realistic wiki pages are required. The obvious solution is to make it possible to copy pages from a production wiki. That raises a number of challanges:

  • large projects like Wikipedia tend to have super-complicated templates with complex dependency chains, requiring ParserFunctions and Scribunto and who knows what else to be installed on the local wiki if it needs to reparse the page. Even if all dependencies are installed, it is extremely slow, times out, runs out of memory etc.
  • wiki pages change all the time; we probably don't want to waste the time with updating on every puppet run or vagrant up, but we don't want to leave them years outdated either. (Yes, vagrant boxes should be cattle not pets and just rebuilt periodically, but a lot of people keep them around for a long time anyway.)
  • wiki pages can be rather large, not to mention the images, which makes it awkward to include them in the mediawiki/vagrant repo even if it is decided that local copies should not be updated frequently (e.g. for reproducibility reasons).

With those restrictions in mind, the best approach seems to be to create a command for copying a remote wiki page to a local wiki, optionally a given revision of the page (if we want to control update times), optionally leaving preparsing to the remote wiki (ie. fetching /wiki/ArticleName?action=raw&templates=expand or something equivalent). The command could either be a puppet resource or a vagrant argument, depending on the use-case: a puppet resource can be packaged together with roles and can auto-update (although some trickery will be needed if we want intelligent behavior like "update if it's older than a month" instead of just updating on every puppet run), while a vagrant command is more flexible as it does not need importable pages to be defined in the repo. The puppet version seems more useful but there is probably space for both.