Page MenuHomePhabricator

WikibaseQualityConstraints should respect query service 429 header response.
Closed, ResolvedPublic13 Estimated Story Points

Description

When the quality constraints API is hit repeatedly mediawiki will flood the query service with requests, and the result will be mediawiki will get banned as in T204267.
To avoid getting banned our requests should respect the header returned by the query service (suggested in T204267#4584397)

BDD
GIVEN a query service related constraint is run
AND the query service sends a 429 response
AND a "Retry-after" header is present
THEN QualityConstraints should not run fresh constraint reports using the query service for that number of seconds

Links
429 details: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

Acceptance Criteria

  • QualityConstraints looks at and respects the "Retry-after" header and 429 status
  • The Retry-after back off is used across all servers
  • If a 429 is received, a notice should be logged
    • if no retry-after is present, a warning should be logged

General implementation breakdown:

  • write timestamp to cache when a 429 response is received
  • read from cache to check if we can make SPARQL requests or not

The best cache to use is probably ObjectCache::newAnything(), so that the 429 protection will still work even if no cache is set up (falling back to the database). This should probably be wrapped up in some new class similar to the existing ResultsCache.

Event Timeline

Addshore triaged this task as Medium priority.Sep 17 2018, 7:54 AM

Tricky to do, because we need to persist the information about the rate limit somewhere between requests. Any ideas for that?

@Addshore

should not run fresh constraint reports

What do you mean with this?

Propose

BDD
GIVEN a query service related constraint check is run
AND the query service sends a 429 response
AND a "Retry-after" header is present
THEN QualityConstraints should disable using the query service for that number of seconds

If disabled AFAIK it falls back to php implementation and does not check all format and some of the (value-)type constraints.
Basically like if there was no service URL defined.

@Lucas_Werkmeister_WMDE @Ladsgroup Maybe we could use some existing LockManager functionality that we already have?

Propose

BDD
GIVEN a query service related constraint check is run
AND the query service sends a 429 response
AND a "Retry-after" header is present
THEN QualityConstraints should disable using the query service for that number of seconds

Yup, that BDD looks good.
I tailed off writing my version of the BDD as I found the open question about if the Retry-After is always provided by WDQS, as with a 429 response I don't believe it is actually required.
@Smalyshev could you let us know if a Retry-After will always be provided?

If disabled AFAIK it falls back to php implementation and does not check all format and some of the (value-)type constraints.
Basically like if there was no service URL defined.

Sounds good.
Is there any indication in the resulting constraint report stating that not all constraints have been run or something similar?

@Lucas_Werkmeister_WMDE @Ladsgroup Maybe we could use some existing LockManager functionality that we already have?

Lockmanager probably isn't the right thing to use, I dont think you can set TTLs on locks in lock manager. You have to actually unlock them.

Tricky to do, because we need to persist the information about the rate limit somewhere between requests. Any ideas for that?

Would could store the timestamp of when the next request should occur in redis or memcached.
Then before the php code goes to run constraint reports using sparql it cn check for the key and see if it is allowed to run the query or not.

@Addshore we could just have a default retry time if not present.
If you tell the API to report compliance you will see that some checks are not run, but by default you only get violations.

@Addshore we could just have a default retry time if not present.

@Smalyshev said in some other ticket that it should always be present :)

To the user, this should not result in a violation report. STATUS_TODO is probably a good status to return in that case.

Change 464812 had a related patch set uploaded (by Matthias Geisler; owner: Matthias Geisler):
[mediawiki/extensions/WikibaseQualityConstraints@master] Respect 429 Http-Responses

https://gerrit.wikimedia.org/r/464812

A general note: Having a Circuit breaker seems good here (there are some libraries already for that)

Change 464812 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Respect 429 Http-Responses

https://gerrit.wikimedia.org/r/464812