Page MenuHomePhabricator

Allow more connections to the SPARQL query service for specific tools/users
Open, LowPublicFeature

Description

My listeria tool continuously operates on ~225K wiki pages with SPARQL queries. To update each page once a week (used to be daily, with fewer pages), that means a rate of 22 SPARQL queries/min. Since each query can run for minutes at a time, the query server often kills the tool and blocks the server (104 error).

The tool does oauth-login as ListeriaBot. Is there a way to tolerate a higher rate of queries from specific users? Or is there a toolforge-only blazegraph instance I can hit instead?

Event Timeline

Gehel triaged this task as Low priority.Apr 15 2024, 1:45 PM
Gehel moved this task from Incoming to Feature Requests on the Wikidata-Query-Service board.
Gehel subscribed.

Sorry for the delayed response. We don't have a mechanism in place to increase limits for specific use cases. We also don't have enough capacity to support the kind of load that listeria would generate if it wasn't throttled, which is why we have throttling in the first place.. Ideally, your bot should check HTTP status code of responses, and slow down when receiving an HTTP 429. Sorry that we can't be more helpful at this time.

On a side note, there have been ideas around having a separate SPARQL endpoint that would serve long running requests in an async mode, with a queue, which could be helpful in this case (T104762). This isn't something we're actively working on at the moment.