Page MenuHomePhabricator

Populating full entity column in wb_terms table when running MW update.php should be fast and run only once
Closed, ResolvedPublic

Description

As reported in T168036 there is an issue with the way term_full_entity_id column is currently populated during run of update.pjp

rebuildTermSqilIndex maintenance script is used, which currently does full rebuild of the index, i.e. it removes terms of the entity, re-generates those terms and puts them back to wb_terms, including full entity ID data. This all takes a good while.
Furthermore, the script does not log that it was run to updatelog. It is fine as the script is meant as a regular maintenance script to be run when needed, it can also be run to start from the given point (to continue after the previous run etc). It is however very bad when it is run as part of normal MW update.php

It seems the maint script should be kept as it is (possibly with some improvements to make it more efficient), but update.php should use something more leightweight and appropriate.

Desired behaviour would be:

  • check if it is needed to populate the full entity id column (i.e. if readFullEntityIdColumn is set to true). If readFullEntityIdColumn is false (as for instance for Wikibase instance running for Wikidata) this is interpreted as its maintainers are going to populate the column outside of the schema update (e.g. because the wb_terms table is so big it does not make sense to do it during the update run).
  • fill term_full_entity_id column using simple SQL query. Or actually two queries, one for items, one for properties.
  • once done, log the update to updatelog, so it is no longer run in the future.

Related Objects

StatusSubtypeAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenFeatureNone
DuplicateNone
ResolvedNone
ResolvedNone
ResolvedNone
DuplicateNone
InvalidLydia_Pintscher
OpenNone
OpenNone
StalledNone
OpenNone
ResolvedAddshore
Resolvedthiemowmde
ResolvedAddshore
DeclinedNone
OpenNone
Resolvedhoo
ResolvedLydia_Pintscher
ResolvedNone
DeclinedNone
InvalidLydia_Pintscher
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
DeclinedNone
ResolvedNone
ResolvedReedy
Resolvedaude
ResolvedWMDE-leszek

Event Timeline

WMDE-leszek created this task.
WMDE-leszek moved this task from Proposed to Doing on the Wikidata-Former-Sprint-Board board.

Change 361052 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Populate full entity ID column and only once on wb_terms schema update

https://gerrit.wikimedia.org/r/361052

Change 361052 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Populate full entity ID column and only once on wb_terms schema update

https://gerrit.wikimedia.org/r/361052