The item_page_link table is built by hand from wikidata site-links (json dumps), internationalized namespaces reference (wmf_raw.project_namespace_map) and page-history (wmf.mediawiki_page_history).
Let's oozify that job as data has proven usefull for the Research team and others.
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Add wikidata item_page_link spark job | analytics/refinery/source | master | +166 -0 | |
| Add wikidata item_page_link oozie job | analytics/refinery | master | +544 -0 |
Event Timeline
Change 572834 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add wikidata item_page_link oozie job
Change 572746 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Add wikidata item_page_link spark job
Change 572834 merged by Milimetric:
[analytics/refinery@master] Add wikidata item_page_link oozie job
Change 572746 merged by jenkins-bot:
[analytics/refinery/source@master] Add wikidata item_page_link spark job
I think we need docs that point to all info that is available from wikidata on cluster, let's at least create the ones for this table, cc @JAllemandou
Already done (not properly linked): https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Wikidata_item_page_link