Page MenuHomePhabricator

Wikidata queries in PAWS return HTTP Error 403: Forbidden
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:

I'm working on this notebook: https://paws-public.wmflabs.org/paws-public/User:OlafJanssen/MapMakingWorkshop_Wikimania2019.ipynb

When I run the cell starting with "from SPARQLWrapper import SPARQLWrapper, JSON", I get an error dump, ending with an 'HTTP Error 403: Forbidden'

Yesterday evening I worked on the same notebook, and then everything was still fine...

Actual Results:

Expected Results:

Event Timeline

BTW you can use a ! to run commands directly from the notebooks. IE use !pip install sparqlwrapper instead of using the terminal every server start. see https://paws-public.wmflabs.org/paws-public/User:chicocvenancio/T230135.ipynb

You probably need to set a good User-Agent – the default Python Requests one is blocked since about a month ago (announcement). (Though that doesn’t explain why it would still have worked yesterday…)

@Chicocvenancio Thanks for that !pip tip, I was not aware of that

@Lucas_Werkmeister_WMDE Thank you, I'll give that a try --> YES, indeed, that makes all the difference, I'm back on track. Many thanks for the fast support!

@Chicocvenancio Thanks for elegant your solution. I looked at https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python

I added this the notebook, is that a valid approach as well? (I'm not experienced with this sort of stuff yet)


import requests

url = 'http://www.kb.nl'

headers = {

'User-Agent': 'My User Agent 1.0',
'From': 'olaf.janssen@kb.nl'  # This is another valid field

}

response = requests.get(url, headers=headers)

@Chicocvenancio Thanks for elegant your solution. I looked at https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python

I added this the notebook, is that a valid approach as well? (I'm not experienced with this sort of stuff yet)


import requests

url = 'http://www.kb.nl'

headers = {

'User-Agent': 'My User Agent 1.0',
'From': 'olaf.janssen@kb.nl'  # This is another valid field

}

response = requests.get(url, headers=headers)

That won't change the UserAgent used by SPARQLWrapper. SPARQLWrapper has an agent parameter you can use that will solve this.

...
def get_results(endpoint_url, query):
    sparql = SPARQLWrapper(endpoint_url, agent='OlafJanssen from PAWS')
...

Is one way to get this working.

@Chicocvenancio Ah, I see... although my appraoch seems to work as well (by accident?).

I've tested with and without the UserAgent, it sometimes works when defaulting to the SPARQLWrapper agent, but without looking at server logs and configuration I don't have much clue on why this is happening. Perhaps someone can investigate and tell us.

What I do know is that your original code is not changing the agent used to make the query, it is making a request to http://www.kb.nl with the My User Agent 1.0 and then making the query to WQS with the SPARQLWrapper agent. I suggest you use the agent="SOME_STRING_THAT_FOLLOWS_POLICY approach.

OK, thanmk you, I'll use the approach you suggest, I've put it in my notebook by now

One more thing: How should be go about implementing the updated line of code

sparql = SPARQLWrapper(endpoint_url, agent='Some uniquely identifyable user agent')

in the outputs in the WDQS-interface, like so

afbeelding.png (834×718 px, 45 KB)

One more thing: How should be go about implementing the updated line of code

sparql = SPARQLWrapper(endpoint_url, agent='Some uniquely identifyable user agent')

in the outputs in the WDQS-interface, like so

(snip)

Yes, that would be T226709: Add user agent to Wikidata Query UI code examples.