Problem
The Wikibase RDF export on Wikidata uses s:, ref: and v: as the namespace prefixes for statement, reference and value nodes respectively:
$ curl -s 'https://www.wikidata.org/wiki/Special:EntityData/Q28726133.ttl?flavor=dump&revision=2163285707' | grep -E '@prefix (s|ref|v):' @prefix s: <http://www.wikidata.org/entity/statement/> . @prefix ref: <http://www.wikidata.org/reference/> . @prefix v: <http://www.wikidata.org/value/> .
However, the Wikidata Query Service instead knows these same prefixes as wds:, wdref: and wdv:, as can be seen in this query:
| wds | wdref | wdv |
|---|---|---|
| http://www.wikidata.org/entity/statement/ | http://www.wikidata.org/reference/ | http://www.wikidata.org/value/ |
(P72202 demonstrates that WDQS doesn’t know the prefixes without the wd part and will raise a syntax error if you try to use them.)
This is a problem for Wikibase-Quality-Constraints, because we’d like to use the WikibaseRepo RdfVocabulary service to generate queries that we can send to WDQS without having to define the prefixes in every query we send – but that doesn’t work if the two disagree about what the prefixes are. (And we want to start using v: / wdv: for the new SPARQL queries in T369079.) It would be great if we could reconcile this somehow, to let WBQC keep sending shorter queries, and also just generally to reduce confusion.
Prior history
I think this difference is due to a bug in Wikibase. The prefix generated by Wikibase used to be wdref, as can be seen in e.g. Wikibase/Indexing/RDF Dump Format/Proposal (linked from change I8651435e8c, which changed NS_REFERENCE from ref: to wdref: way back in April 2015, around the time of the initial query service deployment). Later, in mid-2019, the prefixes were made more configurable for T211799 / T214557; Wikidata should keep its existing prefixes (I assume – I didn’t find a mention of intentional changes for Wikidata but I confess I haven’t read the whole discussion), while Structured Data on Commons needed different ones.
The configuration mainly consists of two prefix strings (prefix-prefixes?) – the rdfNodeNamespacePrefix, and the rdfPredicateNamespacePrefix. On Wikidata, rdfNodeNamespacePrefix is wd (yielding wd: for entities, wdt: for direct claims, wdno: for novalue etc.), while rdfPredicateNamespacePrefix is the empty string (yielding p:, ps:, psv: etc.); whereas on Commons, rdfNodeNamespacePrefix is sdc (yielding sdc: for MediaInfo entities, sdct:, sdcno: etc.), while rdfPredicateNamespacePrefix is also sdc (yielding sdcp:, sdcps:, sdcpsv: etc.). And I think as part of that massive Wikibase change, we accidentally used the predicate prefix rather than the node prefix for the three prefixes which this task is about, even though all of them are nodes and not predicates (statement nodes, reference nodes, and value nodes). If the node prefix is used, they match WDQS again: wds:, wdref:, and wdv:.
The only previous mention of this issue that I’ve found is T297096, reported by @VladimirAlexiev and partially-dismissed by yours truly. The task was about the WBQC RDF export (which is currently unused: T274982); apparently neither of us noticed that the discrepancy also existed with Wikibase itself.
Stability concerns
In T297096#7549331, I wrote that “Prefixes are local to a single RDF document, there’s no requirement to use the same prefix names between different documents as far as I’m aware”. This is half true – in the Wikibase RDF export, I think we’re theoretically free to change the prefixes as we please. I’m sure there are some folks out there who parse the Wikidata RDF dumps with various ill-advised regexes, rather than a proper RDF parser, and who’ve hard-coded the current prefixes (ignoring the @prefix declarations in the output) and who would be broken if we changed the prefixes – but we can follow the usual notification policy to alert them.
But the situation in WDQS is different. Because WDQS allows users to write their SPARQL queries without specifying the standard prefixes (an excellent usability feature – and one that’s implemented in the backend, not in the Wikidata Query UI), changing or removing a prefix runs the risk of breaking existing queries that were relying on that prefix. Adding a new prefix should be possible without breaking any queries, but having two prefixes for the same URI (e.g. s: and wds:) seems unnecessarily confusing. (For the three particular prefixes that this task is concerned with, the number of affected queries is probably relatively low, as these nodes use opaque hashes in the URI and generally wouldn’t be named in a query. But I’m sure there are a few people who query for a hard-coded statement, reference or value node.)
Suggested change
For these reasons, I suggest that Wikibase should change these three prefixes to wds:, wdref: and wdv:, to match both its own former output (prior to August 2019) and the Wikidata Query Service. We should treat this as either a significant or breaking change per the stable interface policy (to be decided), announce it in advance, and give users the opportunity to test the new behavior on Test Wikidata first, as per the usual procedure. (There’s no Test Wikidata Query Service, but as the change wouldn’t affect the query service, that should be fine.)
$buggyNodePrefix = $tmpFeatureFlag ? $nodeNamespacePrefix : $predicateNamespacePrefix; $this->statementNamespaceNames[$repositoryOrSourceName] = [ self::NS_STATEMENT => $buggyNodePrefix . self::NS_STATEMENT, self::NS_REFERENCE => $buggyNodePrefix . self::NS_REFERENCE, self::NS_VALUE => $buggyNodePrefix . self::NS_VALUE, ];