Page MenuHomePhabricator

Wikidata Query Service (Blazegraph) does not correctly represent canonical form of xsd:decimal
Open, MediumPublic

Description

For background, see this issue.

When a statement is written having a value that is a quantity, and that quantity is a whole number, the WD Query Service binds a datatyped literal value like "4"^^xsd:decimal . This is the canonical representation under XML Schema 1. However, under XML Schema 1.1, the canonical representation is now "4.0"^^xsd:decimal .

The practical implication of this is that if a federated query to the Wikidata Query Service is made from a service like Apache Jena that supports XML Schema 1.1, variables that are bound to the same whole decimal number locally and on the remote endpoint do not match.

I realize that Blazegraph is no longer supported by its developers, so I don't know how difficult it would be to update the handling of whole decimal numbers to XML Schema 1.1

To reproduce, load this triple locally in Jena into the graph http://journals:

wd:Q97446840 wdt:P2896 4.0 .

Perform the following query

prefix  wd:  <http://www.wikidata.org/entity/>
prefix  wdt:  <http://www.wikidata.org/prop/direct/>

   SELECT DISTINCT ?value
    WHERE {
    GRAPH <http://journals> { 
    wd:Q97446840 wdt:P2896 ?value.
  }
}

The result is "4.0"^^xsd:decimal

Now in Jena perform the following federated query:

prefix  wd:  <http://www.wikidata.org/entity/>
prefix  wdt:  <http://www.wikidata.org/prop/direct/>

   SELECT DISTINCT ?value
    WHERE {
    SERVICE <https://query.wikidata.org/sparql> { 
    wd:Q97446840 wdt:P2896 ?value.
  }
}

The response is "4"^^xsd:decimal . If you combine the two queries into a single query and perform a MINUS operation, the two bindings for ?value are not seen as the same.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It looks like what we export in Wikibase is neither "4.0"^^xsd:decimal nor "4"^^xsd:decimal:

$ curl -s https://www.wikidata.org/wiki/Special:EntityData/Q97446840.ttl | grep $'\twdt:P2896'
        wdt:P2896 "+4"^^xsd:decimal ;

So I assume it’s indeed Blazegraph that normalizes or canonicalizes the literal (apparently with XSD 1.0 rules), and we won’t be able to fix it in Wikibase.

CBogen triaged this task as Medium priority.Dec 14 2020, 4:23 PM
CBogen moved this task from Incoming to Blazegraph on the Wikidata-Query-Service board.