Page MenuHomePhabricator

Investigate ad-hoc traffic class for API GW rate limits applied to Inference services as used by WME
Closed, ResolvedPublic

Description

The way API GW rate limits currently work is a bit ill-suited to how Wikimedia Enterprise (WME) is intending to use formerly-ORES services that now would be served via Lift Wing.

As a summary, there are currently two ways rate limits are enfoced on Lift Wing:

  • API GW users are rate limited there (more detail below)
  • There is a per-pod per-IP rate limit enforced on Lift Wing. When this limit is triggered, the body of the response contains local_rate_limited. Currently the limit is 100qps (so 360k qph).

API GW access tokens (i.e. when _not_ using anonymous access), are _by default_ limited to 5000 qph (queries per hour, the bucket refills on the hour). A token's rate limit can be increased from the default 5k to 25k ("Preferred" class) or 100k ("Internal" class)[0]. Note that bumping the limit with this method also requires the token to be reset, but whoever generated the token can do this in the web frontend themselves.

For WME use, these classes may not be sufficient, so an ad-hoc class might be a solution (this task is about that).

[0] https://wikitech.wikimedia.org/wiki/API_Gateway#How_to_assign_a_client_to_rate_limit_tier

Event Timeline

I think that we should try to figure out, form the API Platform perspective, what is be best road to follow, since so far I had a completely different understanding of how rate limit worked on the API Gateway.

As far as we get the API Gateway's rate limits for authenticated users are enforced only after the mediawiki oauth ones, namely the tiers that Tobias mentioned in the task's description. Our expectation in using oauth tokens is that it would lead to a preferred way to filter abusive/spam traffic (maybe with more tools in the future), but given the actual state of things maybe we could just set high limits for anon/unauth traffic and use Varnish / Envoy (on Lift Wing) to rate limit traffic?

We'd need to get the long term vision of the API Platform team on this subject, it doesn't seem very clear at the moment.

I see that rate limits tiers were defined in T246271, and they live in wmf-config/CommonSettings.php. Should we add another class like internal-liftwing in there? Or maybe internal-high set to 200k reqs/hour?

Change 927218 had a related patch set uploaded (by Klausman; author: Tobias Klausmann):

[operations/mediawiki-config@master] Add rate limiting class for high-traffic internal services

https://gerrit.wikimedia.org/r/927218

Change 927218 merged by jenkins-bot:

[operations/mediawiki-config@master] OAuthRateLimiter: Add rate limiting class for WME using LiftWing

https://gerrit.wikimedia.org/r/927218

Mentioned in SAL (#wikimedia-operations) [2023-06-06T11:51:48Z] <kamila@deploy1002> Started scap: Backport for [[gerrit:927218|OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-06T11:53:24Z] <kamila@deploy1002> kamila and klausman: Backport for [[gerrit:927218|OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-06T12:00:43Z] <kamila@deploy1002> Finished scap: Backport for [[gerrit:927218|OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121)]] (duration: 08m 54s)

Mentioned in SAL (#wikimedia-operations) [2023-06-06T12:19:06Z] <claime> redeploying 927218 to mw-on-k8s - T338121

After some experimenting, the state of how rate limits for API tokens, the API gateway and Lift Wing currently are applied seems to be this:

  1. JWTs have rate limits encoded in them, by default 5k qph, but there are more classes. The CR last week added a new one called wme with a limit of 250k qph.
  2. The API GW config for LiftWing has limits for anonymous traffic, as well as non-anon traffic. The latter has a limit of 200k qph.
  3. (as a side note and not really relevant for my questions below: LiftWing has an internal limit of 100qps per pod per (remote) IP).
  4. The limits from 1) trump those from 2). Tokens without an encoded limit may exist as legacy tokens, but this is not quite clear.

Questions:

a) It seems to me that configuring non-anon rate limits in the API GW is moot and at best would "catch" legacy tokens that were minted without rate limits. Is this correct?

b) From a usability perspective, ML team managing JWTs of the LW customers has an indirection through the client/token classes configured in the MW config. This would be true for all services using the API GW for traffic control. If this was the intent, what was the rationale of configuring the traffic classes there?

I discussed the above questions with Luca today, and I think for now we can proceed with telling WME to start exploring the documentation we have (and tell us where there are gaps), and start testing against LiftWing/APIGW. This should surface any issues that might still be there, even if in the future the actual implementation of access to LW and rate limiting changes.