Page MenuHomePhabricator

Bring in content for test instance
Closed, ResolvedPublic5 Estimated Story Points

Description

We need a foundation of real content for technical investigations and usability testing. This includes:

  • Template and article content
  • Wikidata used by templates

In order to bring in both article and template content, we will pull in a few long, well-maintained articles from en, de, fa, and tr wikipedias. Then bring in all templates (and all nested templates necessary to make those templates functional) used in these articles. There are two articles that are the same from each wiki, then one featured article from each.

EN
https://en.wikipedia.org/wiki/Barack_Obama
https://en.wikipedia.org/wiki/Dog
https://en.wikipedia.org/wiki/Apollo_11

DE
https://de.wikipedia.org/wiki/Barack_Obama
https://de.wikipedia.org/wiki/Haushund
https://de.wikipedia.org/wiki/SSC_Karlsruhe

FA
https://fa.wikipedia.org/wiki/%D8%A8%D8%A7%D8%B1%D8%A7%DA%A9_%D8%A7%D9%88%D8%A8%D8%A7%D9%85%D8%A7
https://fa.wikipedia.org/wiki/%D8%B3%DA%AF
https://fa.wikipedia.org/wiki/%D8%B1%D8%A3%DB%8C_%D8%A8%D8%AF%DB%8C%D9%84

TR
https://tr.wikipedia.org/wiki/Barack_Obama
https://tr.wikipedia.org/wiki/K%C3%B6pek
https://tr.wikipedia.org/wiki/Dolmabah%C3%A7e_Saray%C4%B1

Note: there is potential that we will need to pull in other articles/templates/languages in the future and this will be our starting point.

Event Timeline

ECohen_WMDE set the point value for this task to 5.Jul 22 2020, 8:14 AM

We can't import articles from these different wikis all into the same test wiki, otherwise templates with the same name will overwrite one another. Therefore, we should create a test wiki corresponding to each of the source wikis: de, fa, and tr must be created.

There's a hiera variable which allows us to easily add languages: role::langwikis::langwiki_list

These language sites have been created, by hieradata/local.yaml in this patch.

Hey @awight this seems like a big issue? Maybe we need to re-examine what content is necessary. We can definitely just start with english or german and then go from there. @thiemowmde and I had previously discussed this as possible, which is why all four languages are listed as requirements.

If we break it up into multiple test instances, I feel like increases the maintenance work and the implementation of what were intended to be quick tests? (Or am I misunderstanding your proposal?)

I'm also unsure of how common it is for templates to have the same name across wikis? At least from my experience so far, template names are written in the language of that wiki so it maybe only be a few edge cases. If this is the case, I wouldn't worry about it because we would still have a broad range of templates in each language working well.

I should have given more context: the test instance already hosts several subdomains, and changing these was no big deal. The work is already completed, and the new languages are available, in fact! This relies on a multi-wiki setup similar to production, where the code base and server framework is shared but configuration, database content, and uploads are separate.

Templates very much conflict in each language, for example ' (single-quote) is a template pulled in by three of the pilot languages:

# grep -rl "{{'" xml_imports/
xml_imports/fawiki-20200729132524.xml
xml_imports/trwiki-20200729132621.xml
xml_imports/enwiki-20200729132234.xml

Content dumps are uploaded to the filesystem for future reference, mediawiki1004.wmde-templates-alpha.eqiad.wmflabs:/srv/mediawiki-vagrant/xml_imports/

Thanks for the extra information -totally makes sense now, sorry I misunderstood. And thanks also for catching this conflict! Should the same log-in work across all of them as well?

In T258563#6344797, @ecohen wrote:

Should the same log-in work across all of them as well?

Interesting that you ask ;-), it should work across all of them (functionality provided by CentralAuth, fwiw), but for some reason this broke when I switched around the languages. I'm trying to restore that now...

Change 617161 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/vagrant@master] [DNM] Custom role for WMDE Template test server

https://gerrit.wikimedia.org/r/617161

Change 617161 abandoned by Awight:
[mediawiki/vagrant@master] [DNM] Custom role for WMDE Template test server

Reason:
squashed into I6754a3d671a4c9ceba7f03b2657098deb97f672d

https://gerrit.wikimedia.org/r/617161

In T258563#6344797, @ecohen wrote:

Should the same log-in work across all of them as well?

Interesting that you ask ;-), it should work across all of them (functionality provided by CentralAuth, fwiw), but for some reason this broke when I switched around the languages. I'm trying to restore that now...

Login should be fixed now.

The listed files have all been imported into their respective wikis, for example:
https://tr-wmde-templates-alpha.wmcloud.org/wiki/Barack_Obama

Looks like styles are mangled, but the templates are present.

How to dump template titles:

mysql enwiki -e 'select page_title from page where page_namespace=10' > en-templates.txt

Unfortunately, any template not included while rendering will be missing from the list, for example Template:Infobox/doc, because these are guarded by <noinclude> tags. These will have to be imported in separate runs.

Import seems to hit a brick wall at ~ 100 pages per XML file :-/

Much better results when importing from the commandline:

mwscript importDump.php --wiki=enwiki --report=10 /vagrant/xml_imports/enwiki-tl-doc-1.xml

All four languages are populated with template docs, now. Great catch, @ECohen_WMDE!

Example templates including TemplateData:
https://tr-wmde-templates-alpha.wmcloud.org/w/index.php?title=%C5%9Eablon:Makam_sahibi_bilgi_kutusu&action=edit
https://fa-wmde-templates-alpha.wmcloud.org/w/index.php?title=%D8%A7%D9%84%DA%AF%D9%88:Nowrap/%D8%AA%D9%88%D8%B6%DB%8C%D8%AD%D8%A7%D8%AA&action=edit
https://en-wmde-templates-alpha.wmcloud.org/w/index.php?title=Template:Nobold/doc&action=edit

Haha, but wait: these include semi-templatised TemplateData, which breaks the editor!,
https://de-wmde-templates-alpha.wmcloud.org/w/index.php?title=Vorlage:Info/Doku&action=edit

Some bits are still missing, due to the partial subtrees found by Special:Export! I will continue to peck away at the holes as they appear. For example, [[ https://en-wmde-templates-alpha.wmcloud.org/wiki/Template:S-break | [1] ]]. This debt is worth paying down in follow-up work, making the exporter spider down to <noinclude> blocks.

I think this means we can move on to using the data. Future adjustments to the content should be documented in the respective tasks.

I've just imported the Common.css files for all 4 wikis, which brings us mostly in line with production.

Demo: https://de-wmde-templates-alpha.wmcloud.org/wiki/Haushund

Lena_WMDE claimed this task.
Lena_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2020-07-22 board.

Breadcrumbs for making yourself an interface admin, here's an example (must be repeated once for each wiki):

mwscript createAndPromote.php --wiki=enwiki --force --interface-admin Adamw