Initially in order to do T239470
After that is done these tables in hadoop will be used to generate some metrics for wikidata that we currently generate from dumps and or form SQl directly.
Initially in order to do T239470
After that is done these tables in hadoop will be used to generate some metrics for wikidata that we currently generate from dumps and or form SQl directly.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Addshore | T208425 [EPIC] Kill the wb_terms table | |||
Resolved | ArielGlenn | T226167 audit public tables and make sure we dump them all | |||
Resolved | Addshore | T219175 [Mega] - Migrate data from wb_terms to new schema | |||
Resolved | Ladsgroup | T219123 Migrate to and read from new store for item terms | |||
Resolved | Addshore | T239470 Check the success of the initial terms migration (does it have holes) | |||
Resolved | Addshore | T239471 Sqoop wikidata terms tables into hadoop |
Change 553698 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/refinery@master] sqoop, add wikidata terms related tables
Change 553727 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/refinery@master] hive tables for wikibase term secondary storage
@JAllemandou I'll move this to waiting on our board for now.
I guess we should probably merge it all?
Change 553698 merged by Joal:
[analytics/refinery@master] sqoop, add wikidata terms related tables
Change 553727 merged by Joal:
[analytics/refinery@master] hive tables for wikibase term secondary storage
Ping for @JAllemandou
4:09 PM <addshore> hiya joal, just checking regarding https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/553698/6..7/python/refinery/sqoop.py
4:09 PM <addshore> should that change have been included in the 2019-10 snapshot? / did it get re run?
4:10 PM <addshore> I see lots of nulls in that field in hadoop which is unexpected
These things now exist in wmf_raw.wikibase_wbt_item_terms for example =]
And the issue mentioned in T239471#5705470 is resolved.
Change 554329 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/refinery@master] sqoop, wb_terms, use term_full_entity_id not term_entity_id
Change 554329 merged by Joal:
[analytics/refinery@master] sqoop, wb_terms, use term_full_entity_id not term_entity_id