Page MenuHomePhabricator

Sqoop wikidata terms tables into hadoop
Closed, ResolvedPublic

Description

Initially in order to do T239470

After that is done these tables in hadoop will be used to generate some metrics for wikidata that we currently generate from dumps and or form SQl directly.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptNov 29 2019, 12:47 PM

Change 553698 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/refinery@master] sqoop, add wikidata terms related tables

https://gerrit.wikimedia.org/r/553698

Change 553727 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/refinery@master] hive tables for wikibase term secondary storage

https://gerrit.wikimedia.org/r/553727

Should be all done, and tables are created in joal db

Addshore added a subscriber: JAllemandou.

@JAllemandou I'll move this to waiting on our board for now.
I guess we should probably merge it all?

Change 553698 merged by Joal:
[analytics/refinery@master] sqoop, add wikidata terms related tables

https://gerrit.wikimedia.org/r/553698

Change 553727 merged by Joal:
[analytics/refinery@master] hive tables for wikibase term secondary storage

https://gerrit.wikimedia.org/r/553727

Ping for @JAllemandou

4:09 PM <addshore> hiya joal, just checking regarding https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/553698/6..7/python/refinery/sqoop.py
4:09 PM <addshore> should that change have been included in the 2019-10 snapshot? / did it get re run?
4:10 PM <addshore> I see lots of nulls in that field in hadoop which is unexpected

These things now exist in wmf_raw.wikibase_wbt_item_terms for example =]
And the issue mentioned in T239471#5705470 is resolved.

Change 554329 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/refinery@master] sqoop, wb_terms, use term_full_entity_id not term_entity_id

https://gerrit.wikimedia.org/r/554329

Change 554329 merged by Joal:
[analytics/refinery@master] sqoop, wb_terms, use term_full_entity_id not term_entity_id

https://gerrit.wikimedia.org/r/554329