Page MenuHomePhabricator

Python query scripts using https://rdflib.github.io/sparqlwrapper/ fail against wb.c instances
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

Nothing. No results and no errors.

What should have happened instead?:

The same results should have been returned as we observed when running the query in the query service page. A similarly generated script succeeds on wikidata'

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
SPARQLWrapper==2.0.0

Event Timeline

This sounds more like a bug with the wrapper than with cloud?

I found this ticket, because I encountered the same problem. But I have a bit more insights into the issue now, so I am posting it here, in case somebody else finds it the same way that I did.

TLDR: SPARQLWrapper does the query URL encoding different than the Query Service and the Query Service can not process the SPARQLWrapper queries properly (in some cases).
Workaround: Use Wikibase Integrator instead.
Things to fix on Wikimedias side: Either also accept the SPARQL Wrapper encoding as well (I am not sure if this is passed directly to Blazegraph or if the decoding happens before), or change the code example for Python from SPARQLWrapper to WikibaseIntegrator.

The problem only appears when the query contains slashes, such as with the prefix definition from the example above. So a query without one works fine, e.g.

SELECT ?item ?value { ?item ?prop ?value } LIMIT 10

When we inspect the different URLs that are send to the endpoint, we can see the difference in the encodings of the query parameters
From the query service:
https://pbsandbox.wikibase.cloud/query/sparql?query=PREFIX%20pbdp%3A%20%3Chttps%3A%2F%2Fpbsandbox.wikibase.cloud%2Fprop%2Fdirect%2F%3E%0ASELECT%20%3Fitem%20%3Fvalue%20%7B%20%3Fitem%20pbdp%3AP476%20%3Fvalue%20%7D

From SPARQLWrapper:
https://pbsandbox.wikibase.cloud/query/sparql?query=%0APREFIX+pbdp%3A+%3Chttps%3A//pbsandbox.wikibase.cloud/prop/direct/%3E%0ASELECT+%3Fitem+%3Fvalue+%7B+%3Fitem+pbdp%3AP476+%3Fvalue+%7D%0A&format=json&output=json&results=json

If we want to emulate the different encodings in Python, we can do this using these lines of code:

import urllib.parse
query = """
PREFIX pbdp: <https://pbsandbox.wikibase.cloud/prop/direct/>
SELECT ?item ?value { ?item pbdp:P476 ?value }
"""
print("Query Service: "+urllib.parse.quote(query, safe=''))
print("SPARQLWrapper: "+urllib.parse.quote_plus(query, safe="/"))

The problem is cause by this fragment from the sparql wrapper command: safe="/" . Without it, the query from the SPARQLWrapper would also be accepted by the SPARQL Endpoint. If we look for the the function call to quote_plus in the SPARQLWrapper Github repo, we find it here and as expected the safe-parameter is the slash in there.

While I am not sure from which side to best approach this problem (either getting this parameter removed from SPARQLWrapper or making the Wikibase SPARQL interface accept the encodings with unencoded slashes, either one would solve this problem.

In the meantime there is a simple workaround: instead of SPARQLWrapper, use Wikibase Integrator:

from wikibaseintegrator.wbi_config import config as wbi_config
from wikibaseintegrator import wbi_helpers

wbi_config['SPARQL_ENDPOINT_URL'] = 'https://pbsandbox.wikibase.cloud/query/sparql'

query = """
PREFIX pbdp: <https://pbsandbox.wikibase.cloud/prop/direct/>
SELECT ?item ?value { ?item pbdp:P476 ?value }
"""

results = wbi_helpers.execute_sparql_query(query)
results = results['results']['bindings']
results

While this may be a little late for you, @daziff , this may be useful for others that discover the issue here, just like I did. In the meantime I think the code example should return the code for Wikibase Integrator instead of SPARQLWrapper, even if this is primarily a problem of SPARQLWrapper. I think this problem also effects Wikidata, for queries that use slashes at some point, only for other wikibase installations it is more noticeable as they can not rely so much on default prefixes and have to define their own in pretty much every query. [Update (2024-01-17): No it doesn't. Upon closer inspection this looks like a WBC problem. For details see my comment below.]

I think WDQS is wrong here. Per RFC3986, section 3.4:

query       = *( pchar / "/" / "?" )

The characters slash ("/") and question mark ("?") may represent data within the query component.

I have a simple test case that seems to point to Wikibase (Cloud) as being against the RFC while Wikidata works.

A trivial query containing a double slash:

PREFIX foo: <https://example.org/>
SELECT (foo: AS ?foo) {}

The query works in Wikidata even with the slashes unquoted (response value is a valid https URL): https://query.wikidata.org/sparql?format=json&query=PREFIX%20foo%3A%20%3Chttps%3A//example.org/%3E%0ASELECT%20(foo%3A%20AS%20%3Ffoo)%20%7B%7D

However, the same request fails on Wikibase Cloud (response value has lost one of the slashes): https://pbsandbox.wikibase.cloud/query/sparql?format=json&query=PREFIX%20foo%3A%20%3Chttps%3A//example.org/%3E%0ASELECT%20(foo%3A%20AS%20%3Ffoo)%20%7B%7D

OTOH, the request works on WCQS so it's not just Wikidata: https://commons-query.wikimedia.org/sparql?format=json&query=PREFIX%20foo%3A%20%3Chttps%3A//example.org/%3E%0ASELECT%20(foo%3A%20AS%20%3Ffoo)%20%7B%7D

In summary, it looks like unescaped slashes are fine but unescaped double slashes in the query get mangled by some over-eager normalisation step (perhaps in the reverse proxy of Wikibase Cloud?).

Upon further checking my claim made above, noticed that this problem does not only not occur on Wikidata but also it doesn't occur on two other WB instances I manage (both non-WBC and with different technical setups). This makes it look more like a WBC specific problem and not a general one. In my post above, I only assumed it would effect Wikidata as well, considering that this looked like a general communication issue problem between SPARQLWrapper and the query service. I didn't thoroughly check it. Sorry for the confusion there. I updated my comment accordingly.