Page MenuHomePhabricator

Really large holes in the new term store (again)
Closed, ResolvedPublic

Description

Doing another analysis based on the sqoop T243763: Run hadoop analysis on wb_terms migration for entities below 29 million to check state says we have even more holes than before. (48M now).

Here's an example from the database:
T243763#5838920

MariaDB [wikidatawiki_p]> SELECT   wbit_item_id as id,   wby_name as type,   wbxl_language as language,   wbx_text as text FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id = 452581;
+--------+-------+----------+--------------------------------+
| id     | type  | language | text                           |
+--------+-------+----------+--------------------------------+
| 452581 | label | de       | Asantehene                     |
| 452581 | label | pl       | Asantehene                     |
| 452581 | label | en       | list of rulers of Asante       |
| 452581 | label | lt       | Asantehene                     |
| 452581 | label | nl       | Asantehene                     |
| 452581 | label | ja       | 君主                           |
| 452581 | label | ru       | Ашантихене                     |
| 452581 | alias | ja       | アシャンティ王の一覧           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
| 452581 | NULL  | NULL     | NULL                           |
+--------+-------+----------+--------------------------------+
83 rows in set (0.01 sec)

Event Timeline

And this is probably also related to T243705

Change 568955 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] wbterms: Write only to the new term store in rebuildItemTerms

https://gerrit.wikimedia.org/r/568955

Investigation lead to these tests and conclusions:

EVIL: Removing "English" en label as well as setting a new english en label and removing other terms (de label, de description) (restores / undos)
https://test.wikidata.org/w/index.php?title=Q562&diff=525291&oldid=525290

EVIL: Removing "English" en label and remove other terms (en-gb alias) (restores / undos)
https://test.wikidata.org/w/index.php?title=Q562&diff=525301&oldid=525300

HAPPY: Removing "English" label ONLY (restores / undos)
https://test.wikidata.org/w/index.php?title=Q562&diff=525304&oldid=525302

EVIL: Removing "English" en label and an en alias (restores / undos)
https://test.wikidata.org/w/index.php?title=Q562&diff=525307&oldid=525306

EVIL: Removing "English" en label and an en alias (single edit in the API wbeditentity)
https://test.wikidata.org/w/index.php?title=Q562&diff=525311&oldid=525309

So the bug has something to do with changing multiple terms at once.

Change 568968 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568968

Change 568975 had a related patch set uploaded (by Ladsgroup; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.16] wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568975

Change 568976 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.16] wbterms: Write only to the new term store in rebuildItemTerms

https://gerrit.wikimedia.org/r/568976

Change 568955 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] wbterms: Write only to the new term store in rebuildItemTerms

https://gerrit.wikimedia.org/r/568955

Change 568968 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568968

Change 569023 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/Wikibase@master] wbterms: tests for not deleting used terms rows

https://gerrit.wikimedia.org/r/569023

Change 568976 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.16] wbterms: Write only to the new term store in rebuildItemTerms

https://gerrit.wikimedia.org/r/568976

Change 568975 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.16] wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568975

Mentioned in SAL (#wikimedia-operations) [2020-01-30T17:49:44Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.16/extensions/Wikibase/repo/maintenance/rebuildItemTerms.php: wbterms: Write only to the new term store in rebuildItemTerms (T243944) (duration: 01m 09s)

Mentioned in SAL (#wikimedia-operations) [2020-01-30T17:51:23Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/Sql/Terms/FingerprintableEntityTermStoreTrait.php: wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds (T243944) (duration: 01m 06s)

I manually went and poked the few items on test that needed poking, so no need to rebuild everything there..

mysql:research@dbstore1004.eqiad.wmnet [testwikidatawiki]> SELECT
    ->   wbpt_property_id as id,
    ->   wby_name as type,
    ->   wbxl_language as language,
    ->   wbx_text as text
    -> FROM wbt_property_terms
    -> LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id
    -> LEFT JOIN wbt_type ON wbtl_type_id = wby_id
    -> LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id
    -> LEFT JOIN wbt_text ON wbxl_text_id = wbx_id
    -> WHERE wbx_text IS NULL;
Empty set (0.82 sec)

mysql:research@dbstore1004.eqiad.wmnet [testwikidatawiki]> SELECT
    ->   wbit_item_id as id,
    ->   wby_name as type,
    ->   wbxl_language as language,
    ->   wbx_text as text
    -> FROM wbt_item_terms
    -> LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id
    -> LEFT JOIN wbt_type ON wbtl_type_id = wby_id
    -> LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id
    -> LEFT JOIN wbt_text ON wbxl_text_id = wbx_id
    -> WHERE wbx_text IS NULL;

+--------+------+----------+------+
| id     | type | language | text |
+--------+------+----------+------+
|      9 | NULL | NULL     | NULL |
| 160331 | NULL | NULL     | NULL |
|   1860 | NULL | NULL     | NULL |
| 211860 | NULL | NULL     | NULL |
+--------+------+----------+------+
4 rows in set (1.45 sec)

mysql:research@dbstore1004.eqiad.wmnet [testwikidatawiki]>
mysql:research@dbstore1004.eqiad.wmnet [testwikidatawiki]> SELECT   wbit_item_id as id,   wby_name as type,   wbxl_language as language,   wbx_text as text FROM wbt_item_terms LEFT JOIN wbt_term_in_lang  ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbx_text IS
NULL;
Empty set (1.47 sec)

Change 569023 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] wbterms: tests for not deleting used terms rows

https://gerrit.wikimedia.org/r/569023