Populating full entity column in wb_terms table when running MW update.php should be fast and run only once
Closed, ResolvedPublic

Description

As reported in T168036 there is an issue with the way term_full_entity_id column is currently populated during run of update.pjp

rebuildTermSqilIndex maintenance script is used, which currently does full rebuild of the index, i.e. it removes terms of the entity, re-generates those terms and puts them back to wb_terms, including full entity ID data. This all takes a good while.
Furthermore, the script does not log that it was run to updatelog. It is fine as the script is meant as a regular maintenance script to be run when needed, it can also be run to start from the given point (to continue after the previous run etc). It is however very bad when it is run as part of normal MW update.php

It seems the maint script should be kept as it is (possibly with some improvements to make it more efficient), but update.php should use something more leightweight and appropriate.

Desired behaviour would be:

  • check if it is needed to populate the full entity id column (i.e. if readFullEntityIdColumn is set to true). If readFullEntityIdColumn is false (as for instance for Wikibase instance running for Wikidata) this is interpreted as its maintainers are going to populate the column outside of the schema update (e.g. because the wb_terms table is so big it does not make sense to do it during the update run).
  • fill term_full_entity_id column using simple SQL query. Or actually two queries, one for items, one for properties.
  • once done, log the update to updatelog, so it is no longer run in the future.
WMDE-leszek triaged this task as High priority.
WMDE-leszek moved this task from Proposed to Doing on the Wikidata-Former-Sprint-Board board.

Change 361052 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Populate full entity ID column and only once on wb_terms schema update

https://gerrit.wikimedia.org/r/361052

Ladsgroup moved this task from incoming to in progress on the Wikidata board.Jun 28 2017, 12:07 AM

Change 361052 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Populate full entity ID column and only once on wb_terms schema update

https://gerrit.wikimedia.org/r/361052

Aleksey_WMDE closed this task as Resolved.Jul 25 2017, 2:51 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 25 2017, 2:51 PM