Page MenuHomePhabricator

403 errors on Wikidata SPARQL queries
Closed, ResolvedPublic

Description

Hello,

I recently started getting 403 Forbidden errors on the Wikidata SPARQL endpoint. The error message do not include any details:

Request from 193.248.56.1 via cp3041 cp3041, Varnish XID 180527026

Error: 403, Forbidden at Tue, 25 Jun 2019 16:21:15 GMT

Here is an example of failing request:

SELECT ?uri ?label ?description (sample(?image) as ?thumbnail) (group_concat(?type; separator="|") as ?types) WHERE {
  VALUES ?uri { <http://www.wikidata.org/entity/Q142794> <http://www.wikidata.org/entity/Q61901279> <http://www.wikidata.org/entity/Q53759004> <http://www.wikidata.org/entity/Q9500714> <http://www.wikidata.org/entity/Q42294058> }
    ?uri rdfs:label ?label
    FILTER(LANG(?label) = 'fr')
  OPTIONAL {
    ?uri schema:description ?description
    FILTER(LANG(?description) = 'fr')
  }
  OPTIONAL { ?uri wdt:P18 ?image }
  OPTIONAL { ?uri wdt:P31?/wdt:P279* ?type }
}
GROUP BY ?uri ?label ?description

Event Timeline

Addshore removed a project: Wikidata-Campsite.
Addshore subscribed.

Are you using the wikidata query UI, or some other script to run the query?
If your hitting the endpoint in a script, what request exactly are you making? to what path?
It looks like the 403 came from the varnish layer rather than the query service itself?
The query works just fine for me currently.
Unfortunatly 403s don't appear in log stash as far as I can tell.

I perform the query through a Ruby application. While examining the code of the SPARQL library I use, I managed to reproduce the error with a curl command:

curl -X POST -H 'Accept: application/sparql-results+json' \
  -H 'Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3' --compressed \
  -H 'User-Agent: Ruby' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'query=SELECT+%3Furi+%3Flabel+%3Fdescription+%28sample%28%3Fimage%29+as+%3Fthumbnail%29+%28group_concat%28%3Ftype%3B+separator%3D%22%7C%22%29+as+%3Ftypes%29+WHERE+%7B%0A++VALUES+%3Furi+%7B+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ142794%3E+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ61901279%3E+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ53759004%3E+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ9500714%3E+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ42294058%3E+%7D%0A++++%3Furi+rdfs%3Alabel+%3Flabel%0A++++FILTER%28LANG%28%3Flabel%29+%3D+%27fr%27%29%0A++OPTIONAL+%7B%0A++++%3Furi+schema%3Adescription+%3Fdescription%0A++++FILTER%28LANG%28%3Fdescription%29+%3D+%27fr%27%29%0A++%7D%0A++OPTIONAL+%7B+%3Furi+wdt%3AP18+%3Fimage+%7D%0A++OPTIONAL+%7B+%3Furi+wdt%3AP31%3F%2Fwdt%3AP279*+%3Ftype+%7D%0A%7D%0AGROUP+BY+%3Furi+%3Flabel+%3Fdescription%0A' \
  https://query.wikidata.org/sparql

Please see https://meta.wikimedia.org/wiki/User-Agent_policy and use descriptive user agent. We do not require too many details, i.e. you don't have to put contact email there if you don't want to (though for mass-querying bots it is probably a good idea in case the bot malfunctions) but something more than "Ruby".

Smalyshev claimed this task.