The use of blank nodes makes an update process always a challenging operation (http://www.aidanhogan.com/docs/blank_nodes_jws.pdf). The use of blank nodes by wikibase is very limited and thus I propose to remove them to simplify the WDQS update strategy.
In wikibase we use blank nodes for two purposes:
- denote an //unknown value// (originally discussed in T95441)
- owl constraints of wdno property
For the unknown value use-case we seem to only use the blank node as a way to //filter// such unknown value.
For the OWL constraints it's unclear if it is actually used/useful.
For unknown values we have two options:
A constant:
```lang=Turtle
wd:Q3 a wikibase:Item, wdt:P2 wikibase:UnknownValue .
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ;
ps:P2 wikibase:UnknownValue ;
wikibase:rank wikibase:NormalRank .
```
A query like
```lang=sparql
SELECT ?human
WHERE {
?human wdt:P106 ?o
FILTER isBLANK(?o) .
}
```
Would become
```lang=sparql
SELECT ?human
WHERE { ?human wdt:P106 wikibase:UnknownValue }
```
This changes the semantic:
- all unknown values are now equal
- impossible to know if a particular property has multiple unknown values by following the direct graph
The other option is to encode the statement ID as the unknown value:
```lang=Turtle
@prefix wdunk: <http://www.wikidata.org/prop/unknown/>
wd:Q3 a wikibase:Item, wdt:P2 wdunk:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ;
ps:P2 wdunk:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c ;
wikibase:rank wikibase:NormalRank .
```
A query like
```lang=sparql
SELECT ?human
WHERE {
?human wdt:P106 ?o
FILTER isBLANK(?o) .
}
```
Would become
```lang=sparql
PREFIX wdunk: <http://www.wikidata.org/prop/unknown/>
SELECT ?human
WHERE {
?human wdt:P106 ?o
FILTER STRSTARTS( STR(?o), 'http://www.wikidata.org/prop/unknown/' ) .
}
```
The ugly STRSTARTS filter can be hidden behind a custom function such as `wikibase:isUnkownValue(?o)`.
This is very close to previous semantic and it incurs a non negligible performance overhead and is very likely to cause more timeouts.
For OWL constraints I simply suggest to remove them or materialize the blank node.
```lang=turtle
wdno:P109 a owl:Class ;
owl:complementOf wdowl:P109 .
wdowl:P109 a owl:Restriction ;
owl:onProperty wdt:P109 ;
owl:someValuesFrom owl:Thing .
```
This is a breaking change to https://www.mediawiki.org/w/index.php?title=Wikibase/Indexing/RDF_Dump_Format if this is accepted I suggest a transition period where blank nodes would be kept, the use of //isBlank// from the query service could start emitting a deprecation warning.