Page MenuHomePhabricator

api-gateway chart: metrics mapping for rerst-gateway
Closed, ResolvedPublic

Description

The current metrics mapping is not suitable for the structure of the descriptors used by the rest-gateway limits. Find a way to expose metrics with the following labels:

  • group (or whatever we end up callign the top level selector... regime?... policy?...)
  • class
  • time unit (if we do T408132)

It would be nice to also use a label for the outcome (within, over, near, shadow), but at a glance there doesn't seem to be a way to make them add up to total_hits. Separate metrics per outcome would be ok.

NOTE: For improving mappings for the api-gateway, see T409173: api-gateway: improve ratelimit metrics mappings

Event Timeline

daniel renamed this task from api-gateway chart: use labels in latelimiter metrics to api-gateway chart: improve metrics mapping.Oct 24 2025, 9:48 AM
daniel updated the task description. (Show Details)

Currently the mapping is done in charts/api-gateway/config/ratelimiter_metrics.yaml
We may want to spin a copy of that config that applies only to the rest-gateway deployments, then tweak the label matches to get the cardinality we want.

Looking at the mappings in the existing ratelimiter_metrics.yaml file in the api-gateway chart, it looks like th erules it contains will not match the actual metrics we have. The mapping is based on matching keys that use "." as a separator, so "ratelimit.service.rate_limit.*.*.near_limit" would give the number of requests near the limit or a rate limit with a single descriptor key (the first * matches the limit domain). In the existing file, we have mappings with two and three starts (matching descriptors with one or two keys). But the descriptors we actually use all have mor ekeys, so they match no mapping rules.

Change #1199008 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/deployment-charts@master] rest-gateway: Create metrics mapping for ratelimit service

https://gerrit.wikimedia.org/r/1199008

daniel renamed this task from api-gateway chart: improve metrics mapping to api-gateway chart: metrics mapping for rerst-gateway.Nov 4 2025, 1:05 PM
daniel updated the task description. (Show Details)

Change #1199008 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: Create metrics mapping for ratelimit service

https://gerrit.wikimedia.org/r/1199008

I merged the mapping for the rest-gateway and exported metrics look pretty good:

cgoubert@deploy2002:/srv/deployment-charts/helmfile.d/services/rest-gateway$ curl http://localhost:9090/metrics | grep service_ | grep -v '#'
[...]
ratelimit_service_rest_gateway_near_limit{policy="experiment-2025-shadow",unit="HOUR",user_class="anon"} 1
ratelimit_service_rest_gateway_over_limit{policy="experiment-2025-shadow",unit="HOUR",user_class="anon"} 10
ratelimit_service_rest_gateway_shadow_mode{policy="experiment-2025-shadow",unit="HOUR",user_class="anon"} 10
ratelimit_service_rest_gateway_total_hits{policy="experiment-2025-shadow",unit="HOUR",user_class="anon"} 13
ratelimit_service_rest_gateway_within_limit{policy="experiment-2025-shadow",unit="HOUR",user_class="anon"} 3

With a little bit of tweaking to the regex for api-gateway we now have correctly labeled metrics, and a (somewhat) useful rate limit graph

image.png (1×2 px, 186 KB)

Clement_Goubert changed the task status from Open to In Progress.Nov 5 2025, 4:57 PM
Clement_Goubert triaged this task as Medium priority.