We need a big sample of articles that is in every test/staging instance and developer machine so that we can test on the same cases and have the same test data.
Such test data should be in a unique place so that we can refer to it as our canonical #reading-web test data repository.
Suggested place: https://www.mediawiki.org/wiki/Reading/QA/Sample_articles
---
[ ] We have a varied amount of articles, files and templates stashed.
[ ] We have documented where this lives and how to import it.
[ ] Such sample data is available in staging/test environments.
[ ] Such sample data is available in beta-cluster and test-wiki.
[ ] Developers have been notified about such sample data.
Implementation notes:
* Vagrant user interacts with it like so:
https://www.irccloud.com/pastebin/MhHjoG7a/
* for a role importing pages, you can look at multimedia
* a role has an associated list of wiki codes and titles, when the role is provisioned it will pull down these articles using the given projects api.
```
mediawiki::export { 'Barack_Obama': wiki_api => '...', file => 'foo' } -> mediawiki::import_text { 'Barack_Obama': sources => 'foo' }
```
* To avoid the issue of importing templates we'll make use of the template expand api: https://en.wikipedia.org/wiki/Barack_Obama?action=raw&templates=expand - articles will look right but their implementation will not make use of templates.
* @dduvall open to pairing on this task for those not familiar with Vagrant
Relevant code:
https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/role/manifests/gwtoolset.pp#L36-L120