Page MenuHomePhabricator

Import wikidata RDF dump to hadoop
Closed, ResolvedPublic

Description

We currently have no easy way to run large scale analysis on the wikidata graph. WDQS and blazegraph are not suited for this scenario. Hadoop seems to be a better fit. Discussing with @JAllemandou we believe that a simple parquet file with quads might be sufficient for now.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 570324 had a related patch set uploaded (by DCausse; owner: Joal):
[wikidata/query/rdf@master] Add WikidataTurtleDumpConverter to rdf-spark-tools

https://gerrit.wikimedia.org/r/570324

Change 570324 merged by jenkins-bot:
[wikidata/query/rdf@master] Add WikidataTurtleDumpConverter to rdf-spark-tools

https://gerrit.wikimedia.org/r/570324

dcausse assigned this task to JAllemandou.
dcausse moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.