Page MenuHomePhabricator

[MCR] Script(s) for populating new tables (slots, content, content_models, slot_roles)
Closed, ResolvedPublic

Description

see https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data

  • Set MediaWiki to write content meta-data to the old AND the new columns (via config[**]). Don't forget to also do this for new entries in the archive table.
  • Wait a bit and watch for performance issues caused by writing to the new table.
  • Run maintenance/populateContentTable.php to populate the content table. The script needs to support chunking (and maybe also sharding, for parallel operation).
  • Keep watching for performance issues while the new table grows.

Operation of populateContentTable.php:

  • Select n rows from the revision table that do not have a corresponding entry in the slots table (a WHERE NOT EXISTS subquery is probably better than a LEFT JOIN for this, because of LIMIT).
  • For each such row, construct a corresponding row for the content and slots table[*][**]. The rows can either be collected in an array for later mass-insert, or inserted individually, possibly buffered in a transaction.
  • The content_models, content_formats, and content_roles tables will be populated as a side-effect, by virtue of calling the assignId() function in order to get a numeric ID for content models, formats, and roles.
  • When all rows in one chunk have been processed, insert/commit the new rows in the content table and wait for slaves to catch up.
  • Repeat until there are no more rows in revision that have no corresponding row in content. This will eventually be the case, since web requests are already populating the content table when creating new rows in revision.

The same procedure can be applied to the archive table respectively.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

thiemowmde triaged this task as Medium priority.Dec 14 2017, 3:44 PM
thiemowmde subscribed.

Please make sure this ticket is linked to a parent task.

Select n rows from the revision table that do not have a corresponding entry in the content table (a WHERE NOT EXISTS subquery is probably better than a LEFT JOIN for this, because of LIMIT).

slots table, not content table.

I don't think WHERE NOT EXISTS versus LEFT JOIN makes any difference for MariaDB, unless you're envisioning a particularly unusual query.

Change 403879 had a related patch set uploaded (by Cicalese; owner: Aude):
[mediawiki/core@master] [MCR] [WIP] populateContentTables maintenance script

https://gerrit.wikimedia.org/r/403879

Change 403879 had a related patch set uploaded (by Aude; owner: Aude):
[mediawiki/core@master] [MCR] [WIP] populateContentTables maintenance script

https://gerrit.wikimedia.org/r/403879

daniel subscribed.

Oops, Legoktm intercepted this in the process of being merged. Still needs some formalities resolved. With Brad on vacation, I'll take that on.

Change 403879 merged by Gergő Tisza:
[mediawiki/core@master] [MCR] populateContentTables maintenance script

https://gerrit.wikimedia.org/r/403879