Page MenuHomePhabricator

Some property labels are not displayed on Item pages
Closed, ResolvedPublic

Description

Some property labels are not displayed on Item pages and show the P-number instead. See for example https://www.wikidata.org/wiki/Q12418#P195
Purging the page doesn't fix it.

Related Objects

Mentioned In
T219123: Migrate to and read from new store for item terms
T243944: Really large holes in the new term store (again)
T243705: Lexemes: in English, "unknown language" appears instead of "English" in search, and "Q1860" appears instead of "English" on lexeme pages
T241212: Investigate orphan rows in the new terms tables after T237984
T241210: Full rebuild of new wikidata terms tables on beta after T237984
T241209: Full rebuild of new wikidata terms tables on test after T237984
T208425: [EPIC] Kill the wb_terms table
T241050: Investigate if FormatterCache is actually used everywhere it should be when rendering an entity
T239470: Check the success of the initial terms migration (does it have holes)
T238473: Label for unit isn't displayed correctly, just Q-number
Mentioned Here
T208425: [EPIC] Kill the wb_terms table
T225857: Document new store design and implementations
T239470: Check the success of the initial terms migration (does it have holes)
P433 greg's ssh config
P501 puppet.conf on integration puppet master
P9914 (An Untitled Masterwork)
P9909 (An Untitled Masterwork)
P9904 (An Untitled Masterwork)
P9906 (An Untitled Masterwork)
T201838: Use link formatter that uses cache instead of wb_terms for all wikidatawiki properties
P9897 (An Untitled Masterwork)
P9898 (An Untitled Masterwork)
P9903 (An Untitled Masterwork)
P84 andre queries
P98 Maximum function level exceeded when viewing large, nested page
P276 WTF HHVM?!
P195 webconsole log from viewing https://www.mediawiki.org/wiki/Talk:Sandbox

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think I know what's causing it but given the nature of the problem (like the ones you mentioned, they all are fixed now), it's really hard to make sure that would fix it. I will make a patch and hopefully wait to see if that stops happening.

Moebeus added a comment.EditedDec 6 2019, 12:18 AM

weird stuff this, I don't envy you trying to track it down. Here's another weird detail I just discovered: When the property labels won't display, they do show up with "fallback values" in another language in the revision history:


This is what it looks like on the item page:

the alternate language in this case ("platemerke" instead of "record label") is Norwegian. I have my UI lang set to English, but I speak Norwegian, don't know if this is helpful or not...
EDIT: The example item is https://www.wikidata.org/wiki/Q235931

Change 555461 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Consider any type of empty value as uncached in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/555461

Change 555461 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Consider any type of empty value as uncached in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/555461

Change 556402 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.10] Consider any type of empty value as uncached in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/556402

Change 556403 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.8] Consider any type of empty value as uncached in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/556403

Change 556403 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.8] Consider any type of empty value as uncached in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/556403

Change 556402 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.10] Consider any type of empty value as uncached in SqlEntityInfoBuilder

https://gerrit.wikimedia.org/r/556402

Mentioned in SAL (#wikimedia-operations) [2019-12-11T16:39:08Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.8/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: Consider any type of empty value as uncached in SqlEntityInfoBuilder (T237984) (duration: 01m 03s)

Mentioned in SAL (#wikimedia-operations) [2019-12-11T16:43:35Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.10/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: Consider any type of empty value as uncached in SqlEntityInfoBuilder (T237984) (duration: 01m 03s)

abian added a subscriber: abian.Dec 14 2019, 11:16 AM

Just chiming in to say this is still going on, names still affected and today I noticed it on architect (P84) as well. https://www.wikidata.org/wiki/Q31900122

On Q78347074 we have two cases: family name and given name. Purging both the Item page and the property pages doesn't fix the issue.

PKM added a comment.Dec 15 2019, 7:02 PM

Also P195 “collection” and at least once yesterday it carried over into the Commons Wikidata Infobox as “P195”.

PKM added a comment.Dec 15 2019, 7:18 PM

Also, the label for P195 doesn't appear in the Query screen.

Addshore claimed this task.Dec 17 2019, 8:34 AM
Restricted Application added a project: User-Addshore. · View Herald TranscriptDec 17 2019, 8:34 AM
Addshore added a comment.EditedDec 17 2019, 10:25 AM

I managed to reproduce on https://www.wikidata.org/wiki/Q4115189 with P195 in https://www.wikidata.org/w/index.php?title=Q4115189&diff=1078162868&oldid=1077776085

With some debugging I found that:

FullEntityParserOutputGenerator::getEntityInfoid is correctly given Q985852 and P195 and en for the fallback chain
And the returning value includes the label "collection" which is what we would epxect to see in the rendered page.

{"P195":{"labels":{"en":{"language":"en","value":"collection"}},"descriptions":{"en":{"language":"en","value":"art, museum, archival, or bibliographic collection the subject is part of"}}},"Q985852":{"labels":{"en":{"language":"en","value":"Fu"}},"descriptions":{"en":{"language":"en","value":"Chinese surname \u5085"}}}}

However StatementGroupListView incorrectly renders the snippet of HTML for the statement group list:

Wikibase\View\StatementGroupListView::getHtml $statementsHtml (new bit) <div class="wikibase-statementgroupview" id="P195" data-property-id="P195">
<div class="wikibase-statementgroupview-property">
<div class="wikibase-statementgroupview-property-label" dir="auto"><a title="Property:P195" href="/wiki/Property:P195">P195</a></div>
</div>
<div class="wikibase-statementlistview">
<div class="wikibase-statementlistview-listview">
<div id="Q4115189$3d46867c-48a2-974a-5992-54592b37ac87" class="wikibase-statementview wikibase-statement-Q4115189$3d46867c-48a2-974a-5992-54592b37ac87 wb-normal">

This uses a propertyIdFormatter which is a DispatchingEntityIdHtmlLinkFormatter.
Which when called with P195 is what gives us the wrong inner HTML snippet.

<a title="Property:P195" href="/wiki/Property:P195">P195</a>

The inner formatter that appears to always be used here is a Wikibase\Lib\Formatters\LabelsProviderEntityIdHtmlLinkFormatter
When LabelsProviderEntityIdHtmlLinkFormatter::formatEntityId is called lookupEntityLabel returns null for P195 resulting in the HTML we have already seen above.

Wikibase\DataModel\Services\EntityId\EntityIdLabelFormatter::lookupEntityLabel lookup class Wikibase\Lib\Store\LanguageFallbackLabelDescriptionLookup for entity P195
Only Q985852 appears to get a CachingFallbackLabelDescriptionLookup here, which is the item that is the value for the statement that I am investigating..


This is worth investigating afterwards as per T201838 the properties should also be using the caching layer introduced during one of the previous wb_terms trailblazes.
My logging result was P9897 when using adhoc logging P9898...

LanguageFallbackLabelDescriptionLookup internally uses a Wikibase\Lib\Store\DispatchingTermBuffer for P195
This buffer uses > Wikibase\DataAccess\ByTypeDispatchingPrefetchingTermLookup which uses a Wikibase\Lib\Store\Sql\Terms\PrefetchingPropertyTermLookup

PrefetchingPropertyTermLookup::prefetchTerms seems to correctly query the DB returning a bunch of rows like:

{"wbpt_property_id":"195","wbpt_term_in_lang_id":"27543"}
{"wbpt_property_id":"195","wbpt_term_in_lang_id":"27542"}
{"wbpt_property_id":"195","wbpt_term_in_lang_id":"27541"}

Which are stored and then, termIdsResolver->resolveGroupedTermIds is called to resolve these wbpt_term_in_lang_id fields, using a Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsResolver (an in memory one is also available)

It was at this point that I decided to have a quick look in the DB, as I hve already gone down this code path rabbit hole for quite a while, and we are nearly at the DB..

The following query should select the en label for the property... But it returned 0 results.

SELECT
  wbpt_property_id as id,
  wby_name as type,
  wbxl_language as language,
  wbx_text as text
FROM wbt_property_terms
LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id
LEFT JOIN wbt_type ON wbtl_type_id = wby_id
LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id
LEFT JOIN wbt_text ON wbxl_text_id = wbx_id
WHERE wbpt_property_id = 195
AND wby_name = 'label'
AND wbxl_language = 'en';

Removing the language and selecting some fields again shows us that there is some odd stuff in the DB..
Full paste at P9903
The weirdest rows are extracted below:

+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+-----------+---------------+--------------+--------+----------+
| wbpt_id | wbpt_property_id | wbpt_term_in_lang_id | wbtl_id   | wbtl_type_id | wbtl_text_in_lang_id | wby_id | wby_name | wbxl_id   | wbxl_language | wbxl_text_id | wbx_id | wbx_text |
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+-----------+---------------+--------------+--------+----------+
|   27557 |              195 |                27407 |     27407 |            1 |                27238 |      1 | label    |      NULL | NULL          |         NULL |   NULL | NULL     |
|  313554 |              195 |            273046250 | 273046250 |            1 |            263833266 |      1 | label    | 263833266 | fr            |     87612649 |   NULL | NULL     |
|  313556 |              195 |            275677887 | 275677887 |            1 |            274119666 |      1 | label    | 274119666 | en-gb         |     87612649 |   NULL | NULL     |
|  313557 |              195 |            275677888 | 275677888 |            1 |            274119667 |      1 | label    | 274119667 | ia            |     87612649 |   NULL | NULL     |
|  313558 |              195 |            275677889 | 275677889 |            1 |            274119668 |      1 | label    | 274119668 | sco           |     87612649 |   NULL | NULL     |
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+-----------+---------------+--------------+--------+----------+

So conclusion of my first look at this is it is probably nothing to do with the reading and rendering side of the stack, but to do with the writing side of the stack.
This explains why purging doesn't really solve the problem.

Making an edit to the property https://www.wikidata.org/w/index.php?title=Property:P195&diff=1078226992&oldid=1070123486 which would write to the terms stores again correctly populates the table and then purging my test item Q4115189 that had the duff property rendered on it fixed the issue for P195.

Looking at the history of the property https://www.wikidata.org/w/index.php?title=Property:P195&action=history
This probably means that the edits on Dec 6th created this issue in the storage.

I'll continue investigating the writing side now..

Addshore added a comment.EditedDec 17 2019, 10:51 AM

Current complete list of message up data in the secondary storage for properties is P9904

Which boils down to these 34 property IDs

1mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT DISTINCT wbpt_property_id FROM wbt_property_terms LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_ty pe_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wby_name = 'label' AND wbx_text IS NULL ORDER BY wbpt_property_id;
2+------------------+
3| wbpt_property_id |
4+------------------+
5| 39 |
6| 84 |
7| 86 |
8| 163 |
9| 175 |
10| 225 |
11| 287 |
12| 361 |
13| 433 |
14| 503 |
15| 512 |
16| 585 |
17| 634 |
18| 642 |
19| 812 |
20| 892 |
21| 951 |
22| 1056 |
23| 1594 |
24| 1705 |
25| 1885 |
26| 1951 |
27| 2176 |
28| 2358 |
29| 2359 |
30| 2561 |
31| 2579 |
32| 2634 |
33| 2827 |
34| 3113 |
35| 4155 |
36| 4329 |
37| 4552 |
38| 4970 |
39+------------------+
4034 rows in set (0.69 sec)

It even looks like at this level we could have a mixture of issues.
For example https://www.wikidata.org/w/index.php?title=Property:P503&action=history has not been edited since early september and has "bad" data.
Others like https://www.wikidata.org/w/index.php?title=Property:P39&action=history have been edited recently (early december) and has bad data.

And the last rebuild started on Nov 28th T237984#5699966

P9909 contains a join of these properties against the dates that they were last touched.

My totally unconfirmed hunch is that the rebuild script works (as we have seen it rebuild things quite successfully.
But then when edits are happening data is removed from the table, when it should not be, and thus some tables have the data and some don't

As said above my next step is looking at the writing code, specifically a deep dive on the deletion code.

Mentioned in SAL (#wikimedia-operations) [2019-12-17T11:27:47Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=1 # T237984

I finished the above maint script run after reaching "Processed up to page 3929876 (P86)" (Full output @ P9914) and can confirm that the query used in P9906 now does not include P39 P84 and P86.

Mentioned in SAL (#wikimedia-operations) [2019-12-17T15:13:09Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=10 --from-id=3929876 # T237984

Mentioned in SAL (#wikimedia-operations) [2019-12-17T15:17:26Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=10 --from-id=3929876 # T237984 (I stopped it at Processed up to page 14546856 (P501))

Addshore added a comment.EditedDec 17 2019, 3:24 PM

It looked like we also had a case of this on test...

mysql:research@dbstore1004.eqiad.wmnet [testwikidatawiki]> SELECT * FROM wbt_property_terms LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT
JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wby_name = 'label' AND wbx_text IS NULL ORDER BY wbpt_property_id;
+---------+------------------+----------------------+---------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
| wbpt_id | wbpt_property_id | wbpt_term_in_lang_id | wbtl_id | wbtl_type_id | wbtl_text_in_lang_id | wby_id | wby_name | wbxl_id | wbxl_language | wbxl_text_id | wbx_id | wbx_text |
+---------+------------------+----------------------+---------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
|   74419 |            39995 |                74356 |   74356 |            1 |                74252 |      1 | label    |   74252 | en            |        73110 |   NULL | NULL     |
+---------+------------------+----------------------+---------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
1 row in set (0.78 sec)

However this item hasn't been touched in August 2017...
https://test.wikidata.org/w/index.php?title=Property:P39995&action=history

I fixes this with the below command allowing it to run only for the single entity:

addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=1 --from-id=118551 --sleep=60

This hole may or may not be related...

I also discovered some holes in the items tables on test:

mysql:research@dbstore1004.eqiad.wmnet [testwikidatawiki]> SELECT * FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN  wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wby_name = 'label' AND wbx_text IS NULL ORDER BY wbit_item_id;
+---------+--------------+----------------------+---------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
| wbit_id | wbit_item_id | wbit_term_in_lang_id | wbtl_id | wbtl_type_id | wbtl_text_in_lang_id | wby_id | wby_name | wbxl_id | wbxl_language | wbxl_text_id | wbx_id | wbx_text |
+---------+--------------+----------------------+---------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
|    4784 |         1744 |                74356 |   74356 |            1 |                74252 |      1 | label    |   74252 | en            |        73110 |   NULL | NULL     |
|    4958 |         1822 |               242972 |  242972 |            1 |               243630 |      1 | label    |  243630 | it            |        73110 |   NULL | NULL     |
|  128365 |        73319 |                74356 |   74356 |            1 |                74252 |      1 | label    |   74252 | en            |        73110 |   NULL | NULL     |
+---------+--------------+----------------------+---------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
3 rows in set (0.77 sec)

I went ahead and fixed these.

addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=testwikidatawiki --from-id=1744 --to-id=1744
Rebuilding Q1744 till Q1744
Done.
addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=testwikidatawiki --from-id=1822 --to-id=1822
Rebuilding Q1822 till Q1822
Done.
addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=testwikidatawiki --from-id=73319 --to-id=73319
Rebuilding Q73319 till Q73319
Done.

So now if we find any more situations like this on test we will know that the hole is caused by whatever is causing this issue, and no some issue with the initial rebuild / import of data into these tables.

Expanding what I am looking at slightly with the following query (no longer just looking at labels):

SELECT * FROM wbt_property_terms LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN  wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbx_text IS NULL ORDER BY wbpt_property_id;

There appears to be a large number of labels and aliases that are incorrect / NULL.
There don't appear to be any NULL descriptions, but there are some rows where wby_name (the term type) is NULL, maybe they were descriptions?

The results of this query show data is wrong at multiple different levels.
So wbt_property_terms refers to multiple rows in these following tables, for rows that do not exist.

  • wbt_term_in_lang
  • wbt_text_in_lang
  • wbt_text

I can't help but stare at DatabaseTermIdCleaner::cleanTermIds and read the doc that says..

It is the caller’s responsibility ensure that the term_in_lang rows are no longer referenced anywhere;

Addshore added a comment.EditedDec 17 2019, 4:34 PM

Looking at https://www.wikidata.org/wiki/Property:P642
The current NL label is "van"
And this is one of the terms that is missing:

| wbpt_id | wbpt_property_id | wbpt_term_in_lang_id | wbtl_id   | wbtl_type_id | wbtl_text_in_lang_id | wby_id | wby_name | wbxl_id   | wbxl_language | wbxl_text_id | wbx_id | wbx_text |
|  320487 |              642 |            361461407 | 361461407 |            1 |            359474163 |      1 | label    | 359474163 | nl            |    110010309 |   NULL | NULL     |

various edits recently happened https://www.wikidata.org/w/index.php?title=Property:P642&action=history

The text "van" does exist in the text table, and is used by 49 different descriptions.
But with wbx_id = 121798608, compared to 110010309 that P642 is trying to refer to.

Confirming this row does not exist currently in the text table:

mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> select * from wbt_text where wbx_id = 110010309;
Empty set (0.00 sec)

But it is referred to:

mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> select * from wbt_text_in_lang where wbxl_text_id = 110010309;
+-----------+---------------+--------------+
| wbxl_id   | wbxl_language | wbxl_text_id |
+-----------+---------------+--------------+
| 359474163 | nl            |    110010309 |
+-----------+---------------+--------------+
1 row in set (0.00 sec)

After speaking to the DBA team, we might be able to figure out when this row was removed from the wbt_text table and thus find the set of queries that happened in the transaction that lead to the decision..

  • Wiki: wikidatawiki
  • Method: DatabaseTermIdsCleaner::cleanTextIds
  • Table: wbt_text
  • wbx_id = 110010309
  • query type: DELETE

Change 558595 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Add loggers to new terms related services

https://gerrit.wikimedia.org/r/558595

I have provided @Addshore with the transaction that deleted the wbx_id = 110010309 row. It happened on 2019-12-05 11:09:27 (although the transaction, and its data context, started in the previous second).

I have provided @Addshore with the transaction that deleted the wbx_id = 110010309 row. It happened on 2019-12-05 11:09:27 (although the transaction, and its data context, started in the previous second).

Thanks!

After a tiny bit of looking this morning, the call to delete wbx_id happened after this edit:

This can be seen in the same transaction during in the deletions of wbx_id.

addshore@mwdebug1001:~$ cat /home/jynus/wbt_text_transaction.sql | grep van
DELETE /* Wikibase\Lib\Store\Sql\TermSqlIndex::deleteTerms  */ FROM `wb_terms` WHERE term_language = 'af' AND term_type = 'description' AND term_text = 'van' AND term_full_entity_id = 'Q55087692'

The revision of P642 https://www.wikidata.org/w/index.php?title=Property:P642&oldid=1069672182 at the time still had a label 'van' and wbx_id = 110010309 was correctly refered to in other tables.

So deletion is definitely the problem.

Change 558924 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Fix property name in DatabaseItemTermStoreTest

https://gerrit.wikimedia.org/r/558924

Change 558924 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Fix property name in DatabaseItemTermStoreTest

https://gerrit.wikimedia.org/r/558924

Mentioned in SAL (#wikimedia-operations) [2019-12-18T15:55:00Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=1 --from-id=14546856 # T237984 (will run to completion)

I just took another look at the holes in the property table and some new ones developed in the first section of it (properties before P500).
I'll run the rebuild script for all properties now (well, for now starting from P500 so that item renderings will slowly start fixing themselves. (see T237984#5751461)
Once this is done a purge of the page will fix the rendering, until a new hole / property has this issue.
This will keep happening until the underlying issue is solved.

Mentioned in SAL (#wikimedia-operations) [2019-12-19T08:27:36Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=5 --from-id=30398836 # T237984

Change 559434 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] WIP tests for T237984 - Unexpected missing term text row

https://gerrit.wikimedia.org/r/559434

Addshore added a comment.EditedDec 19 2019, 10:13 AM

Further investigating the case in discussed in T237984#5748184, T237984#5748567, T237984#5750348 looking at the transaction that happened.

TLDR; Scroll down the the "Got it" section for the location of the issue...

  • 539 calls to Wikibase\Lib\Store\Sql\TermSqlIndex::deleteTerms were made (deleting all terms)
  • 539 ids selected for deletion in Wikibase\Lib\Store\Sql\Terms\DatabaseItemTermStore::acquireAndInsertTerms
  • 539 ids (537 UNIQUE) selected for deletion in Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsCleaner::cleanTermInLangIds
  • 539 ids (107 UNIQUE) selected for deletion in Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsCleaner::cleanTextInLangIds (at least this should be less, one of these deleted was 110010309 'van' (this id only occurred once in the list of ids to be deleted.))

TODO Perhaps the list of ids sent in these queries should be made unique before sending.....

A full walkthrough of this bit of the code, highlighting some interesting bits, but still not finding the issue.

  • DatabaseItemTermStore::storeTerms - The method all of this "magic" is happening in
    • DatabaseItemTermStore::acquireAndInsertTerms - Nothing evil here...
      • Selects wbit_term_in_lang_id values from wbt_item_terms for the item ID to altered
      • Acquirers term ids (for things being inserted). Also generates $termIdsToClean which is a diff of what is in the table and what is desired, resulting in what to be removed.
      • Inserts terms that are needed but missing from the store
      • DELETE from wbt_item_terms all $termIdsToClean which is a collection of wbit_term_in_lang_id for the single wbit_item_id
      • Return the $termIdsToClean which is a collection of wbit_term_in_lang_id
    • DatabaseItemTermStore::cleanTermsIfUnused - Called with $termIdsToClean
      • Iterate through the $termIdsToClean checking both the wbt_property_terms and wbt_item_terms tables for usage <- This will not detect the property usage of the term text, as the term is in a different language
      • $termIdsUnused is the list of ids that are unused according to the last checks
      • Select these once more FOR UPDATE
      • DatabaseTermIdsCleaner::cleanTermIds - Called with all term ids to clean that have not been detected as used in any properties or items
        • DatabaseTermIdsCleaner::cleanTermInLangIds - Called with the same as the above method
          • Select text in lang ids for the term in lang rows we want to delete, set as $potentiallyUnusedTextInLangIds
          • DELETE the wbt_term_in_lang rows that we are cleaning up
          • For each $potentiallyUnusedTextInLangIds check if after the above delete there are still entries in wbt_term_in_lang indicating usage <- Still will not detect the property usage of the term text, as the term text is in a different lang
          • SELECT for update
          • DatabaseTermIdsCleaner::cleanTextInLangIds - Called with the 'unused' text in lang ids
            • Check the wbt_text_in_lang table looking at the same $textInLangIds for find texts we can also delete < - Still won't find the usage of text by the property as the text in lang for the property is not in the list being checked
            • DELETE the text in lang rows
            • foreach $potentiallyUnusedTextIds select from the wbt_text_in_lang table where wbxl_text_id matches the potentially unused id. If select === false, add to $unusedTextIds <- This should find and not delete the text row used by the property
            • SELECT $unusedTextIds for update
            • DatabaseTermIdsCleaner::cleanTextIds - called with $unusedTextIds
              • DELETE from wbt_text

And some tests for the rough situation investigated above https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/559434/ but they all pass..

Got it...

Following exactly where this textId came from before being listed for deletion:

  • cleanTextInLangIds selects some $potentiallyUnusedTextIds which will list 110010309, as this text was used in a term in the entity being deleted
  • Each ID is then checked for existence (after some deletes) in wbt_text_in_lang, which should have flagged the ID as still being used (by the property), and thus the ID should not be added to $unusedTextIds, as it is used
  • Everything in $unusedTextIds is then selected FOR UPDATE from wbt_text_in_lang, and $stillUsedTextIds is set, this WILL NOT include 110010309, as 110010309 is not in $unusedTextIds
  • A diff then happens between $potentiallyUnusedTextIds and $stillUsedTextIds to create the final list of text ids to be cleaned.
    • At this point $potentiallyUnusedTextIds has the ID, $stillUsedTextIds does not, thus 110010309 will be marked for deletion..
  • cleanTextIds is called, which does no further checking and the row is deleted.

This code was last touched in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/547535/ which subtly changed the behaviour.

The previous behaviour before this patch:

  • (same) cleanTextInLangIds selects some $potentiallyUnusedTextIds which will list 110010309, as this text was used in a term in the entity being deleted
  • Everything in $potentiallyUnusedTextIds is then selected FOR UPDATE from wbt_text_in_lang, this will include 110010309 as it is in $potentiallyUnusedTextIds, and $stillUsedTextIds is set, this WILL include 110010309, as 110010309 is not in $potentiallyUnusedTextIds and is still in the table
  • A diff then happens between $potentiallyUnusedTextIds and $stillUsedTextIds
    • At this point both of these arrays have the ID, thus the ID is not returned for deletion.
  • cleanTextIds is called, which does no further checking and the row is not deleted.
Addshore added a comment.EditedDec 19 2019, 11:37 AM

The current pattern of odd rows in the DB that we are currently seeing doesn't only remove the text row, but also the text in lang row.

mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT * FROM wbt_property_terms LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN
  wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wby_name = 'label' AND wbx_text IS NULL ORDER BY wbpt_property_id;
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
| wbpt_id | wbpt_property_id | wbpt_term_in_lang_id | wbtl_id   | wbtl_type_id | wbtl_text_in_lang_id | wby_id | wby_name | wbxl_id | wbxl_language | wbxl_text_id | wbx_id | wbx_text |
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
|  325236 |              225 |            388713206 | 388713206 |            1 |            383127030 |      1 | label    |    NULL | NULL          |         NULL |   NULL | NULL     |
|  325246 |              433 |            388715670 | 388715670 |            1 |            379975720 |      1 | label    |    NULL | NULL          |         NULL |   NULL | NULL     |
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
2 rows in set (0.95 sec)

The same patch touches the same sort of code in cleanTermInLangIds that probably causes the same issue.

  • cleanTermInLangIds is called with $termInLangIds which contains the termInlangIds that are not used in the property or items tables (this is correct)
    • Example: ID 999 (some ID from the edit that triggered the deletion in the above seen case)
  • text in lang Ids are then selected from wbt_term_in_lang where the text in lang id is in $potentiallyUnusedTextInLangIds
    • $potentiallyUnusedTextInLangIds would contain all of the text in lang IDS for the term in lang 999, lets say 11,12,383127030
  • All of the $termInLangIds are then deleted (this is correct)
    • term id 999 has been deleted
  • each $potentiallyUnusedTextInLangIds which currently still looks fine is then select from wbt_term_in_lang a final time and $unusedTextInLangIds is built up when no rows are found for the text in lang id in the term in lang table. (correct)
    • $unusedTextInLangIds now contains all of the text in lang ids that are not in use, so this should be 11 and 12
  • $unusedTextInLangIds are then selected from wbt_term_in_lang FOR UPDATE, setting $stillUsedTextInLangIds to the resulting IDs
    • stillUsedTextInLangIds would then still include 11 and 12
  • A diff then occurs including things that are in $potentiallyUnusedTextInLangIds and not in $stillUsedTextInLangIds
    • so things that are in 11,12,383127030 and not in, 11 and 12, thus 383127030 is passed down for deletion when it should not be.

Essentially:

$potentiallyUnusedTextInLangIds = [11, 12, 383127030];
$unusedTextInLangIds = [11, 12];
$stillUsedTextInLangIds = [11,12];
$forDeletion = array_diff(
			$potentiallyUnusedTextInLangIds,
			$stillUsedTextInLangIds
		);
array(1) {
  [2]=>
  int(383127030)
}

Mentioned in SAL (#wikimedia-operations) [2019-12-19T11:43:49Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=5 --from-id=4089887 # T237984 - Will stop after P433

Addshore reassigned this task from Addshore to Ladsgroup.Dec 19 2019, 12:38 PM

Swapping over to Amir for the fix as decided in the daily

Addshore claimed this task.Dec 19 2019, 2:25 PM

Change 559516 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Fix incorrect deletion of rows in DatabaseTermIdsCleaner

https://gerrit.wikimedia.org/r/559516

Change 559529 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.11] Fix incorrect deletion of rows in DatabaseTermIdsCleaner

https://gerrit.wikimedia.org/r/559529

Mentioned in SAL (#wikimedia-operations) [2019-12-19T16:22:01Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=2 --from-id=36185524 # T237984 (For P4155, then will stop)

The fix will be backported today, and then new occurrences of the issue should stop happening.
I have monitoring the tables for label property holes and periodically running the rebuild script to fill them in (like in T237984#5754507)

Once backported we probably need 1 more run to rebuild the property tables, and also a run to rebuild the item tables.
These runs could be targeted or not (properties can probably run over everything again)
For items we will want to generate a new list as was done in T239470 to be filled in after a fresh sqoop to hadoop.

Change 559516 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Fix incorrect deletion of rows in DatabaseTermIdsCleaner

https://gerrit.wikimedia.org/r/559516

Change 559434 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Tests for T237984 - Unexpected deletion of term rows

https://gerrit.wikimedia.org/r/559434

Change 559529 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.11] Fix incorrect deletion of rows in DatabaseTermIdsCleaner

https://gerrit.wikimedia.org/r/559529

Mentioned in SAL (#wikimedia-operations) [2019-12-19T16:53:59Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.11/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseTermIdsCleaner.php: T237984 [[gerrit:559529]] Fix incorrect deletion of rows in DatabaseTermIdsCleaner (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2019-12-19T16:57:35Z] <addshore> addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=10 # T237984, Full pass (33 rows missing currently)

Addshore added a comment.EditedDec 19 2019, 5:03 PM

For new page renders, and after a page purge we now expect to no longer see the issue as described in this ticket.
Tomorrow after the new rebuild is complete that has been started in T237984#5754602 I will make sure that no new rows are missing (between the 18th and 19th new missing rows appeared)
If no new holes appear we can close this task, but there will be some follow-ups:

  • Continue improving the documentation of the new term system T225857
  • Identify holes in the items terms tables as of 17:00 on the 19th December for rebuilding (TASK TBA)
  • Full rebuild on beta wikidata (TASK TBA)
  • Full rebuild on test wikidata (TASK TBA)
  • And rebuild them in prod (TASK TBA)
  • Refactor DatabaseTermIdsCleaner to make it easier to reason with (TASK TBA)
  • Investigate orphan rows in the new terms tables (beta, test, prod) (TASK TBA)

I have updated T208425: [EPIC] Kill the wb_terms table to include all of the proposed followups and steps from the comment above.

Checking the state of the properties table this morning shows that there are no more wbt_property_terms rows with non existing rows in the other tables.

mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT * FROM wbt_property_terms LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN
  wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbx_text IS NULL ORDER BY wbpt_property_id;
Empty set (5.93 sec)

It appears we found the bug and :

For new page renders, and after a page purge we now expect to no longer see the issue as described in this ticket.

Change 558595 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add loggers to new terms related services

https://gerrit.wikimedia.org/r/558595

Macro such-data:

Awesome job Adam, thank you so much.