Page MenuHomePhabricator

Import wikidata RDF dump to hadoop
Closed, ResolvedPublic


We currently have no easy way to run large scale analysis on the wikidata graph. WDQS and blazegraph are not suited for this scenario. Hadoop seems to be a better fit. Discussing with @JAllemandou we believe that a simple parquet file with quads might be sufficient for now.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 570324 had a related patch set uploaded (by DCausse; owner: Joal):
[wikidata/query/rdf@master] Add WikidataTurtleDumpConverter to rdf-spark-tools

Change 570324 merged by jenkins-bot:
[wikidata/query/rdf@master] Add WikidataTurtleDumpConverter to rdf-spark-tools

dcausse assigned this task to JAllemandou.
dcausse moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.