Page MenuHomePhabricator

Set up a rest-gateway deployment for rate limiting testing
Closed, DeclinedPublic

Description

We need to figure out how to enable testing rate-limiting at the rest-gateway level without disrupting work on T400130: Central REST gateway for APIs

  • Update api-gateway helm chart to support the rate-limiting configuration on both api and rest gateways (sidecar deployment of the ratelimit service)
  • Add rate-limiting configuration to the rest-gateway staging deployment
  • Stand up a ganeti VM for the redis backend

Once internal testing is done, and all production traffic goes through T400130: Central REST gateway for APIs, a possible route for production integration would be to enable this same rate-limiting set up in production, configure it to use shadow mode (no actual rate-limiting, just log what would be done), and see how it reacts.

Event Timeline

There's different setups I would want to test here, but considering that we will be using an Evnoy version less than 1.33 for a while, I propose we test a simple baseline first:

Envoy 1.26 (or whatever) with a central Redis instance, like we have in the Gateway for api.wikimedia.org. We could set that up to handle all API routes, but it doesn't really matter - for preliminary testing, we should use a dummy mackend anyway (e.g. an enginx container that serves static content) and map all routes to that.

In the initial rount, I'd want to test:

  • Identifying the user based on the CentralAuth cookie (not secure, but good enough for testing)
  • Hitting the endpoint with a lot of synthetic traffic (Locust?) to see how Redis holds up to 100k writes/sec
Clement_Goubert changed the task status from Open to In Progress.Sep 10 2025, 10:17 AM
Clement_Goubert claimed this task.
Clement_Goubert triaged this task as High priority.
Clement_Goubert updated the task description. (Show Details)

Change #1186954 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] api-gateway: Generalize ratelimit configuration

https://gerrit.wikimedia.org/r/1186954

No longer needed, we successfully tested on the staging cluster, see T406490