Page MenuHomePhabricator

WE5.1.3 API Rate limiting experiment (local gateway environment)
Closed, ResolvedPublic

Description

Hypothesis: If we create a rate limiting solution for the API gateway using a local development environment based on Kubernetes, we will be able to determine the best option to test with production traffic, by comparing the performance and functionality of at least three different rate limiting services.

Target outcomes

  • A set of configuration files for the gateway helm chart to allow testing different rate limiting solutions in a local development environment
  • A clear path for deploying a rate limiting solution for testing with production traffic.
  • A custom rate limit implementation MVP (“tally”) to evaluate against off-the-shelf solutions.
  • An evaluation chart comparing at least three different solutions with respect to latency, robustness, and resource consumption

Success criteria

  • We are able to evaluate rate limiting solutions with some percentage of production traffic
  • We have identified the desired architecture for rate limit and evaluated possible alternatives
  • We have gained proficiency in working with Envoy, Kubernetes, gRPC and Golang

Epics

  • Implement a custom rate limiting service (T398914)
  • Create a local Minikube environment for setting up a cluster of rate limiters and connecting it to Envoy using an appropriate load balancing strategy. (T398915)
  • Identify and evaluate different technology choices for rate limiting: envoy rate limit vs limitator vs custom (“tally”); Redis vs KeyDB vs in-memory; stretch: Envoy vs Apisix. (T398917)
  • Benchmark different solutions using the local Minikube environment (T398918)
  • Investigate how to deploy solutions to production for testing with real traffic (T398919)
  • Investigate routing/partitioning strategies (T399088)
  • Investigate cost-based rate-limiting (T399844)

Details

Other Assignee
hnowlan

Related Objects

Event Timeline

JTweed-WMF renamed this task from Tracking: WE5.1.3 API Gateway local development to WE5.1.3 API Gateway local development.Jul 10 2025, 9:40 AM
JTweed-WMF edited projects, added Goal; removed Epic.

Thought from a discussion with @hnowlan and @Joe: if in the long run we need to share counters across data centers, we very likely have to implement our own rate limiter. The reason is that the syncinc would have to be done batched, and just for long term counters. So we'd either need custom logic for the sync, or separate storage (redis) backends for short-term and long-term limits. None of the off-the-shelf solutions support that.

daniel renamed this task from WE5.1.3 API Gateway local development to WE5.1.3 API Rate limiting experiment (local gateway environment).Jul 10 2025, 6:55 PM
daniel closed this task as Resolved.EditedOct 31 2025, 9:10 AM

All subtasks are closed, follow-up work is being tracked on T398919 and T399291.

The testing environment is available at https://gitlab.wikimedia.org/daniel/rlstools/-/tree/main/environments/