Problem statement:
We are experiencing severe performance issues on the process that keeps wikidata and the triple store behind WDQS synced. These performance issues cause edits on wikidata to be throttled. While reviewing the way we do updates on the store we decided to move most of its synchronization/reconciliation process out of the triple store with an objective in mind of sending only the minimal amount information needed to mutate the graph with a set of trivial operations (ADD/REMOVE triples). This is where blank nodes are problematic (to dig further into why it's problematic I suggest reading the proposal on TurtlePatch which is an attempt to formalize a patching format for RDF backends).
Where blank nodes are currently used
wikibase we use blank nodes for two purposes:
- denote the existence of a value (ambiguously named unknown value in the UI) (originally discussed in T95441)
- owl constraints of wdno property
For the SomeValue use-case we seem to only use the blank node as a way to filter such value.
For the OWL constraints it's unclear if it is actually used/useful.
Suggested solution
One option is to do blank node skolemization as explained in RDF 1.1 3.5 Replacing Blank Nodes with IRIs.
@prefix genid: <http://www.wikidata.org/.well-known/genid/> wd:Q3 a wikibase:Item, wdt:P2 genid:a8d14fa93486370345412093add8f50c . wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ; ps:P2 genid:a49fd4307e7deef3b569568be8019566 ; wikibase:rank wikibase:NormalRank .
This way such triples would remain "reference-able" allowing to patch the WDQS backend without querying the graph with simple INSERT DATA/DELETE DATA statements.
Problems induced with the approach in WDQS
- Queries using isBlank() will be broken
- Mitigate the issue by introducing a new function wikimedia:isSomeValue() so that queries relying on isBlank() can be rewritten.
- Conflating classic IRIs with SomeValue IRIs (use of isURI/isIRI)
- Queries using isIRI/isURI will have a risk to conflate SomeValue IRIs and thus would have to be verified.
- Consumers of WDQS results expecting blank nodes in results:
- will have to change to understand the skolem IRIs
Migration plan
- 1. Introduce a new wikibase:isSomeValue() function to ease the transition
- 2. Start using stable and unique labels for blank nodes in wikibase RDF Dumps
- 3. Do blank node skolemization in the WDQS update process [BREAKING CHANGE]
- 4. Skolemize blank nodes in the RDF Dump [BREAKING CHANGE]
There are more detailed discussions around this topic here as well.