Page MenuHomePhabricator

Limit the envoy metrics scraped from k8s
Closed, ResolvedPublic

Description

In ops prometheus we limit the envoy metrics scraped via a relabel config:

# Envoy produces a ton of metrics, but for now we're just interested in
# upstream and downstream requests latencies and counts, as well as connection
# stats. So just keep those and nothing else.
'metric_relabel_configs' => [
  { 'source_labels' => ['__name__'],
    'regex'         => '^envoy_(http_down|cluster_up)stream_(rq|cx).*$',
    'action'        => 'keep'
  },
]

We should do the same for the k8s prometheus instances by default.

Event Timeline

JMeybohm triaged this task as Medium priority.Sep 27 2022, 1:17 PM
JMeybohm created this task.
JMeybohm renamed this task from Limit the envoy metrics scraped in k8s to Limit the envoy metrics scraped from k8s.Sep 27 2022, 1:17 PM

Change 835691 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] k8s: Limit envoy metrics scraped from k8s

https://gerrit.wikimedia.org/r/835691

How can we tell if this is working?

Use an envoy metric that doesn't match the regex, such as

rate(envoy_cluster_default_total_match_count{app="api-gateway", envoy_cluster_name=~".*"}[5m])

After the change is merged, this should not return any hits.

How can we tell if this is working?

Use an envoy metric that doesn't match the regex, such as

rate(envoy_cluster_default_total_match_count{app="api-gateway", envoy_cluster_name=~".*"}[5m])

After the change is merged, this should not return any hits.

It will still return hit's l suppose. But it will no longer be updated.

Change 835691 merged by Bking:

[operations/puppet@production] k8s: Limit envoy metrics scraped from k8s

https://gerrit.wikimedia.org/r/835691

This appears to be working based on the grafana dashboard query I ran . I'm resolving the issue, but please feel free to reopen if it is not working as expected.