For LW, even if we get filtering-by-UA and similar functionality from the API GW, we should examine the same functionality in the context of our own Istio ingress
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add basic rate-limit capabilities to ML clusters | operations/deployment-charts | master | +65 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T272917 Lift Wing proof of concept | |||
Resolved | klausman | T288789 API Gateway Integration | |||
Resolved | elukey | T300259 Explore ingress filtering for Lift Wing |
Event Timeline
While checking how to limit egress connections, I found https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/local_rate_limit_filter#config-http-filters-local-rate-limit that looks nice.
Istio supports the definiton of custom Envoy filters via https://istio.io/latest/docs/reference/config/networking/envoy-filter/
Very interesting explanation: https://learncloudnative.com/blog/2022-09-08-ratelimit-istio
I tested the local rate limit in staging briefly, and it seems working nicely. The main drawback is that it is a coarse filter (for example, it is difficult to tune the limit only for connections with UA XYZ and header A matching "batman") but it gives us a basic high level protection (for example, rate limit if the number of requests is higher than X/s or X/min etc..).
The following has been tested in staging with various thresholds (100 rps, 50 rps, etc..) and it worked fine:
apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: filter-local-ratelimit-svc namespace: istio-system spec: workloadSelector: labels: app-wmf: kserve-inference configPatches: - applyTo: HTTP_FILTER match: context: SIDECAR_INBOUND listener: filterChain: filter: name: "envoy.filters.network.http_connection_manager" patch: operation: INSERT_BEFORE value: name: envoy.filters.http.local_ratelimit typed_config: "@type": type.googleapis.com/udpa.type.v1.TypedStruct type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit value: stat_prefix: http_local_rate_limiter token_bucket: max_tokens: 100 tokens_per_fill: 100 fill_interval: 1s filter_enabled: runtime_key: local_rate_limit_enabled default_value: numerator: 100 denominator: HUNDRED filter_enforced: runtime_key: local_rate_limit_enforced default_value: numerator: 100 denominator: HUNDRED response_headers_to_add: - append: false header: key: x-local-rate-limit value: 'true'
The other main caveat is that this is a per-pod rate limit, not a global one. So the rps threshold set is related to what every pod can sustain (even better for us in my opinion).
I'd avoid the global rate limit for the moment, it needs an extra Redis support and it doesn't really give us a lot more than the local rate limit.
@klausman lemme know if the above makes sense, already applied manually in staging. If so I'll test metrics (they need to be whitelisted) and I'll add a simple template for it in deployment-charts.
Change 860925 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/deployment-charts@master] Add basic rate-limit capabilities to ML clusters
Tested also metrics:
elukey@ml-staging2002:~$ sudo nsenter -t 1653602 -n curl -s localhost:15000/stats/prometheus | grep rate_limit # TYPE envoy_http_local_rate_limiter_http_local_rate_limit_enabled counter envoy_http_local_rate_limiter_http_local_rate_limit_enabled{} 292 # TYPE envoy_http_local_rate_limiter_http_local_rate_limit_enforced counter envoy_http_local_rate_limiter_http_local_rate_limit_enforced{} 0 # TYPE envoy_http_local_rate_limiter_http_local_rate_limit_ok counter envoy_http_local_rate_limiter_http_local_rate_limit_ok{} 292 # TYPE envoy_http_local_rate_limiter_http_local_rate_limit_rate_limited counter envoy_http_local_rate_limiter_http_local_rate_limit_rate_limited{} 0
The only sad thing is that we need to kill/respawn pods get the envoy config updated :(
Change 860925 merged by Elukey:
[operations/deployment-charts@master] Add basic rate-limit capabilities to ML clusters
Deployed rate limits and new istio configs for showing up their metrics. The only downside is that the new metrics require a restart of all the pods, so for the moment let's wait for another broad deployment since it is not really urgent.
Next steps:
- wait for a complete roll restart of pods (a broad deployment of isvcs is fine)
- check new metrics in Thanos and add them to the istio dashboard.