Page MenuHomePhabricator

Generate RDF from JSON
Open, Stalled, NormalPublic

Description

Instead of generating RDF dumps from the database, have a maintenance script that reads a JSON dump, and generates RDF output from that. This would allow use to generate consistent RDF dumps for various scopes, flavors and formats, with consistent data. It is also likely to be faster than loading entities from the external storage database (depending on FS access details).

Event Timeline

daniel created this task.Mar 26 2015, 12:04 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a project: Wikidata.
daniel added a subscriber: daniel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 26 2015, 12:04 PM
Lydia_Pintscher triaged this task as High priority.Mar 30 2015, 9:52 AM
Lydia_Pintscher set Security to None.
hoo added a subscriber: hoo.Apr 7 2015, 5:33 PM
JanZerebecki lowered the priority of this task from High to Normal.Jul 23 2015, 3:07 PM
Smalyshev changed the task status from Open to Stalled.Apr 4 2018, 8:31 PM
hoo added a subscriber: Smalyshev.Apr 4 2018, 10:13 PM

@Smalyshev Why did you mark this Stalled?

As far as I can see, nothing happened here for a year. Moreover, subtasks are also dormant for a year. So it seems to be stalled. But if I'm wrong and something is happening here, please reclassify.

Pintoch added a subscriber: Pintoch.Apr 2 2019, 2:28 PM

I think Wikidata-Toolkit could be used for that:
https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-rdf/src/main/java/org/wikidata/wdtk/rdf/RdfSerializer.java
Obviously it would mean making sure the RDF serialization produced by it is consistent with what is being fed in WDQS at the moment.

The analytics hadoop cluster could also be of use here: the task can easily take advantage of parallelization.