Page MenuHomePhabricator

Explore ingress filtering for Lift Wing
Closed, ResolvedPublic

Description

For LW, even if we get filtering-by-UA and similar functionality from the API GW, we should examine the same functionality in the context of our own Istio ingress

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
Resolvedklausman
Resolvedelukey

Event Timeline

Removing assignee so we can prioritize/schedule the task during the next grooming.

Very interesting explanation: https://learncloudnative.com/blog/2022-09-08-ratelimit-istio

I tested the local rate limit in staging briefly, and it seems working nicely. The main drawback is that it is a coarse filter (for example, it is difficult to tune the limit only for connections with UA XYZ and header A matching "batman") but it gives us a basic high level protection (for example, rate limit if the number of requests is higher than X/s or X/min etc..).

The following has been tested in staging with various thresholds (100 rps, 50 rps, etc..) and it worked fine:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: filter-local-ratelimit-svc
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      app-wmf: kserve-inference
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: "envoy.filters.network.http_connection_manager"
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
              token_bucket:
                max_tokens: 100
                tokens_per_fill: 100
                fill_interval: 1s
              filter_enabled:
                runtime_key: local_rate_limit_enabled
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              filter_enforced:
                runtime_key: local_rate_limit_enforced
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              response_headers_to_add:
                - append: false
                  header:
                    key: x-local-rate-limit
                    value: 'true'

The other main caveat is that this is a per-pod rate limit, not a global one. So the rps threshold set is related to what every pod can sustain (even better for us in my opinion).

I'd avoid the global rate limit for the moment, it needs an extra Redis support and it doesn't really give us a lot more than the local rate limit.

@klausman lemme know if the above makes sense, already applied manually in staging. If so I'll test metrics (they need to be whitelisted) and I'll add a simple template for it in deployment-charts.

Change 860925 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] Add basic rate-limit capabilities to ML clusters

https://gerrit.wikimedia.org/r/860925

Tested also metrics:

elukey@ml-staging2002:~$ sudo nsenter -t 1653602 -n curl -s localhost:15000/stats/prometheus | grep rate_limit
# TYPE envoy_http_local_rate_limiter_http_local_rate_limit_enabled counter
envoy_http_local_rate_limiter_http_local_rate_limit_enabled{} 292
# TYPE envoy_http_local_rate_limiter_http_local_rate_limit_enforced counter
envoy_http_local_rate_limiter_http_local_rate_limit_enforced{} 0
# TYPE envoy_http_local_rate_limiter_http_local_rate_limit_ok counter
envoy_http_local_rate_limiter_http_local_rate_limit_ok{} 292
# TYPE envoy_http_local_rate_limiter_http_local_rate_limit_rate_limited counter
envoy_http_local_rate_limiter_http_local_rate_limit_rate_limited{} 0

The only sad thing is that we need to kill/respawn pods get the envoy config updated :(

Change 860925 merged by Elukey:

[operations/deployment-charts@master] Add basic rate-limit capabilities to ML clusters

https://gerrit.wikimedia.org/r/860925

Deployed rate limits and new istio configs for showing up their metrics. The only downside is that the new metrics require a restart of all the pods, so for the moment let's wait for another broad deployment since it is not really urgent.

Next steps:

  • wait for a complete roll restart of pods (a broad deployment of isvcs is fine)
  • check new metrics in Thanos and add them to the istio dashboard.

Added the rate limits panel to Grafana, the task should now be done!