To curb the load on mw-api-int caused by the search update pipeline's fetch operator, search would like a global rate limit (not one per worker as that leads to long tails and unused "quota").
Rate-limiting is a long-wanted feature for the MW API (internal) anyways, see T248543. Service/Ops is willing to discuss implementing it the envoy-way: Envoy supports local and remote/distributed rate limits, as described here. The least invasive approach to test this would be the following:
- set up an envoy rate-limit service (backed by redis)
- configure the client-side sidecar envoys to use that rate-limit service
This avoids unnecessary network traffic leaving the pod.
With that setup/configuration in place, the fetch operator must handle HTTP 429 responses gracefully, by retrying, but with a shorter, non-growing delay unlike regular retries.
api-gateway already uses a combination of ratelimit (the standard implementation for global rate limiting from envoy) and redis-misc (via nutcracker). In that setup, ratelimit is running as a sidecar alongside the api-gateway envoy.
For the mesh ratelimit, we decided to provide a central ratelimit service via it's own chart and deployment that can be used by all service mesh envoys and may hold multiple rate limit configurations (domains) for different use-cases. The initial rate limit configuration should allow 1k/rps per user-agent as that is easy enough to distinguish and we encourage mw-api client to properly identify themselves anyways.
For this MVP implementation the mesh "clients" should be able to opt-in to being rate limited via configuration values, the proposed implementation/configuration structure can be found at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1028558
There also is an initial dashboard available showing the metrics the ratelimit service exposes (via the statsd exporter, as it does not support native prometheus metrics): https://grafana.wikimedia.org/d/bf921591-bd2b-4a87-ae20-7cc6f227e58a/jayme-ratelimit
I tried to condense the above into https://wikitech.wikimedia.org/wiki/Ratelimit