As a search engineer I want to have access to mediainfo RDF dumps in HDFS so that I can manipulate this data with hadoop and join with do joins with the wikidata RDF dumps.
There will be 3 steps to this task
- make the TTL available in hdfs: https://gerrit.wikimedia.org/r/c/operations/puppet/+/642411
- adapt the rdf-spark-tools to support commons mediainfo dump: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/649323
- write an airflow dag
AC:
- mediainfo TTL dumps are imported and munged weekly in hdfs