For updating the data in the graph, we use a query that deletes the old data. The query to delete reference values that are not used by any other statements, and it looks like this, with SELECT replaced with DELETE:
```
SELECT ?s ?p ?o
WHERE {
<http://www.wikidata.org/entity/Q30> ?statementPred ?statement .
FILTER( STRSTARTS(STR(?statement), "http://www.wikidata.org/entity/statement/") ) .
?statement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
# Since references are shared we can only clear the values on them when they are no longer used
# anywhere else.
FILTER NOT EXISTS {
?otherStatement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
?otherEntity ?otherStatementPred ?otherStatement .
FILTER ( ?otherEntity != <http://www.wikidata.org/entity/Q23> ) .
}
?ref ?expandedValuePred ?s .
# Without this filter we'd try to delete stuff from entities. For example that pattern above matches
# ref:_ v:P143 entity:Q328
# so we'd try to clear everything from Q328 (enwiki). So we filter where ?s is in the value prefix.
FILTER( STRSTARTS(STR(?s), "http://www.wikidata.org/entity/value/") ) .
?s ?p ?o .
}
```
This query is very slow (had to kill it after 1+ minute). However, the query without FILTER NOT EXISTS runs under a second:
```
SELECT ?s ?p ?o
WHERE {
<http://www.wikidata.org/entity/Q30> ?statementPred ?statement .
FILTER( STRSTARTS(STR(?statement), "http://www.wikidata.org/entity/statement/") ) .
?statement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
?ref ?expandedValuePred ?s .
# Without this filter we'd try to delete stuff from entities. For example that pattern above matches
# ref:_ v:P143 entity:Q328
# so we'd try to clear everything from Q328 (enwiki). So we filter where ?s is in the value prefix.
FILTER( STRSTARTS(STR(?s), "http://www.wikidata.org/entity/value/") ) .
?s ?p ?o .
}
```
This query produces 8 triples, belonging to 2 separate subjects. So the reason of the slowdown is FILTER NOT EXISTS. Interestingly enough, this query:
```
SELECT ?s ?p ?o
WHERE {
<http://www.wikidata.org/entity/Q30> ?statementPred ?statement .
FILTER( STRSTARTS(STR(?statement), "http://www.wikidata.org/entity/statement/") ) .
?statement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
# Since references are shared we can only clear the values on them when they are no longer used
# anywhere else.
FILTER NOT EXISTS {
?otherStatement <http://www.w3.org/ns/prov#wasDerivedFrom> ?ref .
?otherEntity ?otherStatementPred ?otherStatement .
}
?ref ?expandedValuePred ?s .
# Without this filter we'd try to delete stuff from entities. For example that pattern above matches
# ref:_ v:P143 entity:Q328
# so we'd try to clear everything from Q328 (enwiki). So we filter where ?s is in the value prefix.
FILTER( STRSTARTS(STR(?s), "http://www.wikidata.org/entity/value/") ) .
?s ?p ?o .
}
```
note the internal != filter deleted - also is slow, though in theory it should be failing very fast, since without such filter it is clear FILTER NOT EXISTS contadicts the previous conditions and at least one data set satisfying that condition exists - it's the same data set we have from first three lines of the query.
So looks like these is some issue in processing FILTER NOT EXISTS here, somehow it is not optimal.
The data can be seen in db01 labs machine, in namespace `wdq`.