Page MenuHomePhabricator

Deploy kube-state-metrics
Closed, ResolvedPublic

Description

We should include a kube-state-metrics deployment into each of our clusters to be able to collect metrics about the state of API objects. That would help us more easily gather information on things like deployments happening (T222826) or containers not being killed/restarted often (T256256).

For this we need to (at least):

As this will generate a lot of extra metrics, we should probably talk to o11y.

Event Timeline

JMeybohm triaged this task as Medium priority.Oct 5 2020, 2:39 PM
JMeybohm created this task.
JMeybohm renamed this task from Deploy kube-state-meterics to Deploy kube-state-metrics.Oct 5 2020, 2:40 PM
JMeybohm added a project: serviceops.

@akosiaris discovered recently (more or less by accident) that we're overcommitting CPU by quite a bit on wikikube clusters. With kube-state-metrics we should be able to make this more visible (or even alert on it).
This could be a nice task to do for people doing k8s training currently ;)

kamila changed the task status from In Progress to Stalled.Sep 4 2023, 10:35 AM

on the back burner while I'm busy freaking out about the DC switchover (T345263)

From a quick check of https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-state-metrics the helm chart seems a good fit for our use case:

  • PSP policies can be enabled/disabled (since we'll have to depreacate them soon) and more in general, all features like autoscaling etc.. are if-guarded and not enabled by default. There seems to be no option/feature automatically enabled that we don't support.
  • Network policies seems sane, and we'll just need to allow kube-state-metrics pod to reach the Kube API, so very easy use case.
  • I don't see any weird permission to assign to the kube-state-metrics pod.
  • There seems to be an active community behind it (https://github.com/prometheus-community/helm-charts/commits/main/charts/kube-state-metrics).

I'd be in favor to start testing the helm chart, there may be some tweak needed but overall it looks good imho.

kamila changed the task status from Stalled to In Progress.Oct 31 2023, 5:02 PM

Change 970425 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] Initial commit of kube-state-metrics chart from prometheus-community

https://gerrit.wikimedia.org/r/970425

Change 972400 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] [WIP] add kube-state-metrics helmfile

https://gerrit.wikimedia.org/r/972400

Change 970425 merged by jenkins-bot:

[operations/deployment-charts@master] Initial commit of kube-state-metrics chart from prometheus-community

https://gerrit.wikimedia.org/r/970425

Change 972400 merged by jenkins-bot:

[operations/deployment-charts@master] Add WIP kube-state-metrics deployment to staging

https://gerrit.wikimedia.org/r/972400

Change 972869 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] kube-state-metrics: bump chart version, add upstream version

https://gerrit.wikimedia.org/r/972869

Change 972869 merged by jenkins-bot:

[operations/deployment-charts@master] kube-state-metrics: bump chart version, add upstream version

https://gerrit.wikimedia.org/r/972869

Change 973134 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] enable kube-state-metrics prototype in eqiad

https://gerrit.wikimedia.org/r/973134

Change 973134 merged by jenkins-bot:

[operations/deployment-charts@master] enable kube-state-metrics prototype in eqiad

https://gerrit.wikimedia.org/r/973134

A prototype deployment in eqiad with a pretty vanilla config:

  • generates <80k timeseries; so let's call it up to ~100k timeseries per cluster
  • uses <50mCPU and <120MB memory with (curl-simulated) metrics scraping on, so I believe that deploying a single replica is fine

@fgiunchedi is adding up to 100k timeseries per k8s cluster OK?

@fgiunchedi is adding up to 100k timeseries per k8s cluster OK?

Not sure off the bat, do you have a dump or a sample of the metrics? Will they be collected by Prometheus from a single (per-cluster) endpoint?

@fgiunchedi is adding up to 100k timeseries per k8s cluster OK?

Not sure off the bat, do you have a dump or a sample of the metrics? Will they be collected by Prometheus from a single (per-cluster) endpoint?

I left a dump in kubernetes1008:~kamila/ksm-metrics-eqiad.txt (it's too big for paste, sorry; also I'm not 100% sure it's all public data right now, WIP).

Yes, we would like to use a single endpoint. If necessary, it can be sharded, but a single endpoint would be preferable.

@fgiunchedi is adding up to 100k timeseries per k8s cluster OK?

Not sure off the bat, do you have a dump or a sample of the metrics? Will they be collected by Prometheus from a single (per-cluster) endpoint?

I left a dump in kubernetes1008:~kamila/ksm-metrics-eqiad.txt (it's too big for paste, sorry; also I'm not 100% sure it's all public data right now, WIP).

Yes, we would like to use a single endpoint. If necessary, it can be sharded, but a single endpoint would be preferable.

Thank you, I've summarised the top 20 metric names below, if there's anything that can be obviously dropped that'd be helpful, if not I think we should be fine to at least try it. ack re: single endpoint, should be fine too

$ grep -v '^#' ksm-metrics-eqiad.txt  | cut -d\{ -f1 | sort | uniq -c | sort -rn | head -20
   5558 kube_pod_container_resource_requests
   5312 kube_pod_container_resource_limits
   3985 kube_pod_status_reason
   3985 kube_pod_status_phase
   2779 kube_pod_container_status_waiting
   2779 kube_pod_container_status_terminated
   2779 kube_pod_container_status_running
   2779 kube_pod_container_status_restarts_total
   2779 kube_pod_container_status_ready
   2779 kube_pod_container_state_started
   2779 kube_pod_container_info
   2666 kube_pod_tolerations
   2391 kube_pod_status_scheduled
   2391 kube_pod_status_ready
   2391 kube_pod_status_qos_class
   1594 kube_pod_ips
    959 kube_secret_type
    959 kube_secret_metadata_resource_version
    959 kube_secret_info
    959 kube_secret_created

The kube_pod_ metrics are one of the main reasons for KSM, although I think we can reduce them a bit by dropping:

  • kube_pod_status_qos_class
  • kube_pod_tolerations
  • kube_pod_ips

Unfortunately that will require a custom scrape config as we can't disable those in KSM (they are all part of the Pod collector).

In KSM itself I would disable the following collectors (at least for wikikube):

  • CertificateSigningRequest
  • ConfigMap
  • Endpoint
  • HPA
  • Ingress
  • LimitRange
  • NetworkPolicy
  • PersistentVolume
  • PersistentVolumeClaim
  • ResourceQuota
  • Secret
  • Service
  • StorageClass
  • VolumeAttachment

Change 973762 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] kube-state-metrics: reduce number of metrics

https://gerrit.wikimedia.org/r/973762

Change 973762 merged by jenkins-bot:

[operations/deployment-charts@master] kube-state-metrics: reduce number of metrics

https://gerrit.wikimedia.org/r/973762

With @JMeybohm's suggestion (T264625#9324445) we are at around 60k timeseries.

With @JMeybohm's suggestion (T264625#9324445) we are at around 60k timeseries.

Looks good to me, let's give it a try! Feel free to send reviews my way

Change 974151 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] kube-state-metrics: enable Prometheus scraping

https://gerrit.wikimedia.org/r/974151

Change 974158 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] kube-state-metrics: DRY network policy

https://gerrit.wikimedia.org/r/974158

Change 974151 merged by jenkins-bot:

[operations/deployment-charts@master] kube-state-metrics: enable Prometheus scraping

https://gerrit.wikimedia.org/r/974151

Change 974171 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] kube-state-metrics: enable in codfw + staging-eqiad

https://gerrit.wikimedia.org/r/974171

Change 974171 merged by jenkins-bot:

[operations/deployment-charts@master] kube-state-metrics: enable in codfw + staging-eqiad

https://gerrit.wikimedia.org/r/974171

KSM in staging-eqiad was in a half installed state (probably due to prematurely terminated helmfile/helm command):

(HelmReleaseBadStatus) firing: Helm release kube-system/kube-state-metrics on k8s-staging@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
# helm -n kube-system history kube-state-metrics                                                                                      
REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION                                       
1               Tue Nov 14 14:52:55 2023        pending-install kube-state-metrics-5.10.2                       Initial install underway

I did the easy thing:

helm -n kube-system delete kube-state-metrics 
helmfile -e staging-eqiad -i apply --context 5

KSM in staging-eqiad was in a half installed state (probably due to prematurely terminated helmfile/helm command): [...]

Huh, strange, thank you.

Other than that, it has been running in staging and main for a week and it looks good. OK to deploy it in the other clusters (ml-*, dse, aux) too?

KSM in staging-eqiad was in a half installed state (probably due to prematurely terminated helmfile/helm command): [...]

Huh, strange, thank you.

Other than that, it has been running in staging and main for a week and it looks good. OK to deploy it in the other clusters (ml-*, dse, aux) too?

+1 on my end

KSM in staging-eqiad was in a half installed state (probably due to prematurely terminated helmfile/helm command): [...]

Huh, strange, thank you.

Other than that, it has been running in staging and main for a week and it looks good. OK to deploy it in the other clusters (ml-*, dse, aux) too?

I'd leave that to the cluster "owners" to decide (and do) tbh. You may announce the availability on the Kubernetes SIG mailinglist https://www.mediawiki.org/wiki/Kubernetes_SIG (I've also put it on the agenda four the next SIG meeting next week).

Change 978129 had a related patch set uploaded (by CDanis; author: Chris Danis):

[operations/deployment-charts@master] [aux-k8s-eqiad] add kube-state-metrics

https://gerrit.wikimedia.org/r/978129

Change 978504 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Deploy kube-state-metrics to the dse-k8s cluster

https://gerrit.wikimedia.org/r/978504

Change 978504 merged by jenkins-bot:

[operations/deployment-charts@master] Deploy kube-state-metrics to the dse-k8s cluster

https://gerrit.wikimedia.org/r/978504

Change 979930 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: deploy kube-state-metrics on all ml clusters

https://gerrit.wikimedia.org/r/979930

Change 979930 merged by Elukey:

[operations/deployment-charts@master] admin_ng: deploy kube-state-metrics on all ml clusters

https://gerrit.wikimedia.org/r/979930

KSM is deployed in other clusters and appears to work, so I'm closing this :-)

Change 974158 merged by jenkins-bot:

[operations/deployment-charts@master] kube-state-metrics: DRY network policy

https://gerrit.wikimedia.org/r/974158

Change 978129 merged by jenkins-bot:

[operations/deployment-charts@master] [aux-k8s-eqiad] add kube-state-metrics

https://gerrit.wikimedia.org/r/978129