Page MenuHomePhabricator

Create list of criteria for graph backend candidates for WDQS
Closed, ResolvedPublic


As a WDQS maintainer, I want to be able to evaluate graph backend candidates for migrating WDQS off of Blazegraph, so that I can create a ranking/survey of alternatives, and ultimately choose the optimal one.

Prior candidate list and survey from when Blazegraph was chosen:

We will likely not use the same list as before, and will need to create a new list of criteria (and weighting of those criteria). The final list will aim to combine technical scaling considerations, as well as a relatively small finite list of community-sourced criteria (in the case that they do not totally overlap). While there will eventually be a better process for consolidating a final list of community-sourced criteria, (comments in) this ticket can be used to start collecting ideas for criteria.


  • a list of criteria to evaluate graph backends for the purpose of scaling WDQS

Event Timeline

MPhamWMF moved this task from Incoming to Scaling on the Wikidata-Query-Service board.

QLever - -

The paper reports benchmarks favorable for QLever. I cannot get path queries working on the public endpoint.

(QLever could be a candidate. It is not a criteria)

Hernández, Daniel & Hogan, A. & Krötzsch, M.. (2015). Reifying RDF: What works well with wikidata?. 1457. 32-47.

Abstract: In this paper, we compare various options for reifying RDF triples. We are motivated by the goal of representing Wikidata as RDF, which would allow legacy Semantic Web languages, techniques and tools - for example, SPARQL engines - to be used for Wikidata. However, Wikidata annotates statements with qualifiers and references, which require some notion of reification to model in RDF. We thus investigate four such options: (1) standard reification, (2) n-ary relations, (3) singleton properties, and (4) named graphs. Taking a recent dump of Wikidata, we generate the four RDF datasets pertaining to each model and discuss high-level aspects relating to data sizes, etc. To empirically compare the effect of the different models on query times, we collect a set of benchmark queries with four model-specific versions of each query. We present the results of running these queries against five popular SPARQL implementations: 4 store, BlazeGraph, GraphDB, Jena TDB and Virtuoso.

Consider graph databases that support RDF-star and SPARQL-star such as RDF4J, AnzoGraph and GraphDB since they are proposed extensions to the RDF and SPARQL standards to provide a more convenient way to annotate RDF statements and to query such annotations (wikidata qualifiers and references), bridging the gap between the RDF world and the Property Graph world.

See W3C Draft Community Group Report 01 July 2021

Re the prior candidate list:

Prior candidate list and survey from when Blazegraph was chosen:

It would be great to have annotated scale to help figure what software looks like the best candidate, and avoid gut jugdment.

Given a "multi operation ACID", it might look like:

  • 0: No ACID guarantees
  • 1: ACID guarantees for primary representation, but async secondary representations (indices)
  • 2: ACID both primary and secondary representations

Regarding ACID in particular, there is another lever that is "isolation level", see

Also the current scale 0-10 is way to large, it is too much work to document for every row what every number means between zero and ten.

It seems clear that that sheet is just an indicator, and only gives clues of what might work best, and grading well on that can not be the primary motivation for picking a solution.

Design for ~10X growth, but plan to rewrite before ~100X

Jeff Dean, “Challenges in Building Large-Scale Information Retrieval Systems,” Google,

Criteria are defined in the paper, WDQS Backend Alternatives, published on the page,