The way API GW rate limits currently work is a bit ill-suited to how Wikimedia Enterprise (WME) is intending to use formerly-ORES services that now would be served via Lift Wing.
As a summary, there are currently two ways rate limits are enfoced on Lift Wing:
- API GW users are rate limited there (more detail below)
- There is a per-pod per-IP rate limit enforced on Lift Wing. When this limit is triggered, the body of the response contains local_rate_limited. Currently the limit is 100qps (so 360k qph).
API GW access tokens (i.e. when _not_ using anonymous access), are _by default_ limited to 5000 qph (queries per hour, the bucket refills on the hour). A token's rate limit can be increased from the default 5k to 25k ("Preferred" class) or 100k ("Internal" class)[0]. Note that bumping the limit with this method also requires the token to be reset, but whoever generated the token can do this in the web frontend themselves.
For WME use, these classes may not be sufficient, so an ad-hoc class might be a solution (this task is about that).
[0] https://wikitech.wikimedia.org/wiki/API_Gateway#How_to_assign_a_client_to_rate_limit_tier