Page MenuHomePhabricator

Import commons mediainfo RDF dumps to hive
Closed, ResolvedPublic8 Estimated Story Points

Description

As a search engineer I want to have access to mediainfo RDF dumps in HDFS so that I can manipulate this data with hadoop and join with do joins with the wikidata RDF dumps.

There will be 3 steps to this task

AC:

  • mediainfo TTL dumps are imported and munged weekly in hdfs

Event Timeline

CBogen triaged this task as High priority.Dec 14 2020, 4:15 PM
CBogen moved this task from needs triage to ML & Data Pipeline on the Discovery-Search board.
CBogen set the point value for this task to 8.Jan 11 2021, 4:56 PM

Change 658465 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikimedia/discovery/analytics@master] airflow dag for commons dump

https://gerrit.wikimedia.org/r/658465

The rdf spark tools changes have been merged. The airflow work is in progress and waiting on @elukey to merge the puppet patch.

Change 642411 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] Add import_commons_mediainfo_dumps to role::analytics_cluster::launcher

https://gerrit.wikimedia.org/r/642411

Change 642411 merged by Elukey:
[operations/puppet@production] Add import_commons_mediainfo_dumps to role::analytics_cluster::launcher

https://gerrit.wikimedia.org/r/642411

Change 664707 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikimedia/discovery/analytics@master] create init dag for import ttl

https://gerrit.wikimedia.org/r/664707

Change 664707 abandoned by Mstyles:
[wikimedia/discovery/analytics@master] create init dag for import ttl

Reason:
changes now a part of https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/ /658465

https://gerrit.wikimedia.org/r/664707

Change 658465 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] airflow dag for commons dump

https://gerrit.wikimedia.org/r/658465

Mentioned in SAL (#wikimedia-operations) [2021-02-24T20:40:25Z] <mstyles@deploy1001> Started deploy [wikimedia/discovery/analytics@44fba51]: add import ttl dags - T270103

Mentioned in SAL (#wikimedia-operations) [2021-02-24T20:42:59Z] <mstyles@deploy1001> Finished deploy [wikimedia/discovery/analytics@44fba51]: add import ttl dags - T270103 (duration: 02m 33s)

last step is to make sure that the airflow dags work next week and then this ticket should be good to be closed

Change 667683 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikimedia/discovery/analytics@master] fix import commons ttl dag

https://gerrit.wikimedia.org/r/667683

Change 667683 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] fix import commons ttl dag

https://gerrit.wikimedia.org/r/667683

Mentioned in SAL (#wikimedia-operations) [2021-03-01T21:52:02Z] <mstyles@deploy1002> Started deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103)

Mentioned in SAL (#wikimedia-operations) [2021-03-01T21:54:36Z] <mstyles@deploy1002> Finished deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103) (duration: 02m 34s)

Mentioned in SAL (#wikimedia-operations) [2021-03-01T22:39:10Z] <mstyles@deploy1002> Started deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103)

Mentioned in SAL (#wikimedia-operations) [2021-03-01T22:41:14Z] <mstyles@deploy1002> Finished deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103) (duration: 02m 04s)