Page MenuHomePhabricator

Handle timing out SPARQL endpoint in Pywikibot
Open, Needs TriagePublicBUG REPORT

Description

I have some bots that are not doing their reporting. Looking at the logs I noticed I hit the default socket timeout of 45 seconds. The Wikidata SPARQL endpoint at https://query.wikidata.org/ has a server side time out of 60 seconds. That extra 15 seconds might just be enough.

Steps to replicate the issue (include links if applicable):

I added this to my user-config.py:

socket_timeout = 90

And ran a bot that does some heavy SPARQL queries ( https://github.com/multichill/toollabs/blob/master/bot/wikidata/painting_external-id_property_statistics.py ).

What happens?:

Not enough extra time so I'm hitting the server side time out:

  File "/home///pywikibot/pywikibot/comms/http.py", line 356, in error_handling_callback
    raise ServerError(
pywikibot.exceptions.ServerError: 500 Server Error: Internal Server Error
CRITICAL: Exiting due to uncaught exception ServerError: 500 Server Error: Internal Server Error

What should have happened instead?:

We should probably handle the 500 error as a recoverable error (just like the socket time out) and retry.

Software version (skip for WMF-hosted wikis like Wikipedia): Recent (but not latest) git version

Other information (browser name/version, screenshots, etc.):

Event Timeline

Xqt renamed this task from Handle timing out SPARQL endpint in Pywikibot to Handle timing out SPARQL endpoint in Pywikibot.Aug 10 2023, 7:11 AM

We should probably handle the 500 error as a recoverable error (just like the socket time out) and retry.

Maybe the WDQS maintainers will disagree, but to me, automatically retrying a query that timed out and already consumed a significant amount of resources in doing so doesn’t sound like a good idea.

We should probably handle the 500 error as a recoverable error (just like the socket time out) and retry.

Maybe the WDQS maintainers will disagree, but to me, automatically retrying a query that timed out and already consumed a significant amount of resources in doing so doesn’t sound like a good idea.

That's current behavior. Queries often fail and on second try you do get valid data. Don't forget that the query service maintainers themselves implemented it like this in Pywikibot, see https://github.com/wikimedia/pywikibot/blame/master/pywikibot/data/sparql.py