Page MenuHomePhabricator

[EPIC] Kill the wb_terms table
Open, HighPublic

Description

As decided in 2019, the wb_terms is to be replaced by a set of normalized tables

Production Migration to the set of normalized tables and Dropping the old table

Tickets that can be closed once the table is dead

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedAddshore
ResolvedAddshore
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
ResolvedNone
DeclinedNone
Declined alaa_wmde
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedJeroenDeDauw
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedLadsgroup
ResolvedNone
InvalidNone
DeclinedNone
ResolvedLucas_Werkmeister_WMDE
InvalidNone
DeclinedNone
ResolvedLadsgroup
DeclinedNone
ResolvedJeroenDeDauw
DeclinedNone
Resolved alaa_wmde
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedAddshore
ResolvedNone
InvalidNone
Resolved alaa_wmde
ResolvedLadsgroup
InvalidNone
DuplicateLadsgroup
ResolvedLadsgroup
DuplicateLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
OpenNone
ResolvedAddshore
ResolvedAddshore
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedJun 17 2019 alaa_wmde
Resolved alaa_wmde
ResolvedJun 18 2019Ladsgroup
ResolvedNone
ResolvedJun 23 2019 alaa_wmde
ResolvedJun 25 2019Ladsgroup
Resolved alaa_wmde
Resolved alaa_wmde
ResolvedBUG REPORT alaa_wmde
ResolvedLadsgroup
ResolvedAddshore
ResolvedAddshore
Resolved Marostegui
ResolvedAddshore
Resolved Marostegui
Resolved Marostegui
ResolvedAddshore
ResolvedLadsgroup
ResolvedAddshore
DuplicateNone
Resolved alaa_wmde
Declined alaa_wmde
Duplicate alaa_wmde
ResolvedAddshore
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
OpenNone
OpenNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
StalledNone
ResolvedLadsgroup
ResolvedAddshore
Resolved alaa_wmde
InvalidNone
DeclinedNone
InvalidNone
ResolvedNone
ResolvedLucas_Werkmeister_WMDE
InvalidNone
InvalidNone
ResolvedLucas_Werkmeister_WMDE
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
ResolvedLucas_Werkmeister_WMDE
DuplicateNone
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedAddshore
Resolved alaa_wmde
ResolvedAddshore
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedLadsgroup
Resolved Marostegui
ResolvedJAllemandou
ResolvedBstorm
ResolvedAddshore
Resolved Marostegui
ResolvedJohan

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Addshore updated the task description. (Show Details)Jan 15 2020, 4:54 PM
Addshore updated the task description. (Show Details)Jan 15 2020, 4:59 PM
Addshore updated the task description. (Show Details)Jan 15 2020, 5:07 PM
Addshore updated the task description. (Show Details)Jan 15 2020, 5:12 PM
Addshore updated the task description. (Show Details)Jan 15 2020, 5:15 PM
Addshore updated the task description. (Show Details)Jan 16 2020, 4:07 PM
Ladsgroup updated the task description. (Show Details)Jan 28 2020, 11:41 AM

Thanks a lot for the work on this- may I suggest a step before the next step (after rebuilding) of "checking all data, old and new, is consistent". This is a lot of data, and even on well thought processes missing rows were discovered after I requested a comparison on other well-though migration, which happened due to mistakes/existing inconsistencies/aborts. May I request such a step, which could be as fast as a simple join query (<5m to run) between old an new to check no rows are missing or extra, and have equivalent data?

Thanks a lot for the work on this- may I suggest a step before the next step (after rebuilding) of "checking all data, old and new, is consistent". This is a lot of data, and even on well thought processes missing rows were discovered after I requested a comparison on other well-though migration, which happened due to mistakes/existing inconsistencies/aborts. May I request such a step, which could be as fast as a simple join query (<5m to run) between old an new to check no rows are missing or extra, and have equivalent data?

We already do this with sqoop and analytics team (thanks to @JAllemandou). here's an example. That's how we discovered things like T243944: Really large holes in the new term store (again)

We will also do this in SQL as we progress through reading from more of the new schemas.
See T219123#5844088 where this has already been done.

Addshore updated the task description. (Show Details)Mar 6 2020, 2:02 PM
Ladsgroup updated the task description. (Show Details)Mar 18 2020, 3:18 AM

Mentioned in SAL (#wikimedia-releng) [2020-03-18T03:16:34Z] <Amir1> dropping wb_terms table from wikidatawiki in beta cluster (T219123 T219175 T208425)

<3!!!!

Change 580876 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Stop writing to old term store in testwikidatawiki

https://gerrit.wikimedia.org/r/580876

Change 580876 merged by jenkins-bot:
[operations/mediawiki-config@master] Stop writing to old term store in testwikidatawiki

https://gerrit.wikimedia.org/r/580876

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:45:44Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Stop writing to old term store in testwikidatawiki (T208425)]] (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:48:31Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Stop writing to old term store in testwikidatawiki (T208425)]], take II (duration: 01m 07s)

Change 580881 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Stop writing to old term store (wb_terms table) in wikidata

https://gerrit.wikimedia.org/r/580881

Change 580881 merged by jenkins-bot:
[operations/mediawiki-config@master] Stop writing to old term store (wb_terms table) in wikidata

https://gerrit.wikimedia.org/r/580881

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:55:47Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Stop writing to old term store (wb_terms table) in wikidata (T208425)]] (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:58:04Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Stop writing to old term store (wb_terms table) in wikidata (T208425)]], take II (duration: 01m 06s)

Ladsgroup updated the task description. (Show Details)Mar 18 2020, 11:01 AM

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:58:04Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Stop writing to old term store (wb_terms table) in wikidata (T208425)]], take II (duration: 01m 06s)

@Ladsgroup: This is a breaking change. Please announce it at Project chat.

Ladsgroup added a comment.EditedMar 18 2020, 1:43 PM

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:58:04Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Stop writing to old term store (wb_terms table) in wikidata (T208425)]], take II (duration: 01m 06s)

@Ladsgroup: This is a breaking change. Please announce it at Project chat.

This has been announced a year ago. We will announce again soon.

Mentioned in SAL (#wikimedia-operations) [2020-03-18T14:17:58Z] <marostegui> Rename wb_terms on codfw hosts: s8 (wikidatawiki - db2081), s3 (testwikidatawiki - db2109), s4 (commonswiki, testcommonswiki - db2106) T208425

Following up the conversation on IRC with @Ladsgroup I have renamed wb_terms to T208425_wb_terms on the following hosts and wikis (all in codfw):

root@cumin1001:/home/marostegui# for i in db2106 db2109 db2081; do echo $i; mysql.py -h$i information_schema -e "select table_schema,table_name from tables where table_name like '%wb_terms';";done
db2106
+-----------------+------------------+
| table_schema    | table_name       |
+-----------------+------------------+
| commonswiki     | T208425_wb_terms |
| testcommonswiki | T208425_wb_terms |
+-----------------+------------------+
db2109
+------------------+------------------+
| table_schema     | table_name       |
+------------------+------------------+
| testwikidatawiki | T208425_wb_terms |
+------------------+------------------+
db2081
+--------------+------------------+
| table_schema | table_name       |
+--------------+------------------+
| wikidatawiki | T208425_wb_terms |
+--------------+------------------+

None of those hosts have slaves.
If there are any writes, replication will break and we'll notice.

If that happens, we just need to revert and restart replication with:

set session sql_log_bin=0; rename table  T208425_wb_terms to wb_terms; start slave;"

Let's give it 24h or so. Once we are fully sure nothing is writing, maybe we can move forward and rename the table on s3 (testwikidatawiki) eqiad host and monitor for errors.

Marostegui moved this task from Triage to In progress on the DBA board.Mar 19 2020, 6:59 AM
Addshore updated the task description. (Show Details)Mar 23 2020, 3:30 PM
Addshore updated the task description. (Show Details)
Addshore updated the task description. (Show Details)Mar 23 2020, 3:31 PM
Jony added a subscriber: Jony.Mar 23 2020, 4:14 PM

I have a query, https://quarry.wmflabs.org/query/18763, that returns label/description/alias statistics for a set of languages (also mentioned in T197161#4300061). This was derived from a query used by @Pasleim to update this table (which worked until the end of May) and its South Asian counterpart (which worked up until wb_terms updates got turned off). I tried to rewrite the Quarry query to use the new databases (https://quarry.wmflabs.org/query/41692), but running this has not yet succeeded, either using Quarry or directly on tools-login. I am not sure whether this rewritten query can be simplified beyond what's written at present, so I was hoping there might be a better way of obtaining these statistics via SQL that does not presently time out.

To find the number of labels in a specific languages, you can use ElasticSearch: https://www.wikidata.org/w/index.php?search=haslabel%3Abn though this will not work in aliases.

To find the number of labels in a specific languages, you can use ElasticSearch: https://www.wikidata.org/w/index.php?search=haslabel%3Abn though this will not work in aliases.

This does not yield a number that I would expect; for a language like Khowar (khw), which is not updated frequently (or at all?) on Wikidata, I obtained a number close to 270k when searching "haslabel:khw" today, which compared to the numbers in this table for that language doesn't make much sense.

Beyond this, however, there was an element of convenience in being able to obtain such a number for a small set of languages in one go (maybe not all languages at once, but perhaps in segments?), and I was hoping to replicate this element by rewriting this query to use the new databases.

I have a query, https://quarry.wmflabs.org/query/18763, that returns label/description/alias statistics for a set of languages (also mentioned in T197161#4300061). This was derived from a query used by @Pasleim to update this table (which worked until the end of May) and its South Asian counterpart (which worked up until wb_terms updates got turned off). I tried to rewrite the Quarry query to use the new databases (https://quarry.wmflabs.org/query/41692), but running this has not yet succeeded, either using Quarry or directly on tools-login. I am not sure whether this rewritten query can be simplified beyond what's written at present, so I was hoping there might be a better way of obtaining these statistics via SQL that does not presently time out.

The best I can suggest with the new tables for quarry is https://quarry.wmflabs.org/query/43212
You'll have to run a query per language in order to not be killed.

Ideally we will calculate these numbers on the Analytics cluster and publish them somewhere they can just be queried as we used to do.
Probably needs T154601: Grafana: "wikidata-datamodel-terms" doesn't update anymore to be fixed, thn each day you could grab these numbers in under a second!

WMDE-leszek updated the task description. (Show Details)Apr 3 2020, 7:47 AM
WMDE-leszek updated the task description. (Show Details)Apr 3 2020, 9:27 AM
WMDE-leszek updated the task description. (Show Details)

Change 586467 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Do not try to drop things when there's no wb_terms table

https://gerrit.wikimedia.org/r/586467

Change 586469 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.26] Do not try to drop things when there's no wb_terms table

https://gerrit.wikimedia.org/r/586469

Change 586469 merged by Addshore:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.26] Do not try to drop things when there's no wb_terms table

https://gerrit.wikimedia.org/r/586469

Mentioned in SAL (#wikimedia-operations) [2020-04-06T23:59:23Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table T208425 T249565 (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2020-04-07T00:01:28Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table T208425 T249565 cache bust (duration: 01m 01s)

Change 586467 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Do not try to drop wb_items_per_site when there's no wb_terms table

https://gerrit.wikimedia.org/r/586467

Change 586480 had a related patch set uploaded (by Jforrester; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.26] Follow-up a956c655: Only avoid dropping wb_items_per_site so prod can be merged

https://gerrit.wikimedia.org/r/586480

Change 586480 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.26] Follow-up a956c655: Only avoid dropping wb_items_per_site so prod can be merged

https://gerrit.wikimedia.org/r/586480

Mentioned in SAL (#wikimedia-operations) [2020-04-07T01:05:04Z] <jforrester@deploy1001> Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: T208425 T249565 Follow-up a956c655: Only avoid dropping wb_items_per_site so prod can be merged (duration: 00m 58s)

Change 596015 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Add some deprecated annotations to some old terms classes

https://gerrit.wikimedia.org/r/596015

Change 596015 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add some deprecated annotations to some old terms classes

https://gerrit.wikimedia.org/r/596015

Addshore updated the task description. (Show Details)May 14 2020, 10:11 AM