Page MenuHomePhabricator

Scrape controller-manager and scheduler metrics
Closed, ResolvedPublic

Description

We should add scraping of kube-controller-manager and kube-scheduler metrics for our >= 1.23 Kubernetes clusters.

For this we need to:

  • change --bind-address for both components
  • add --tls-cert-file and --tls-private-key-file for both components
  • add scraping config to prometheus k8s.pp

kube-controller-manager metrics count:

curl -ks --cert /etc/kubernetes/pki/wikikube_staging__kubernetes-admin.pem --key /etc/kubernetes/pki/wikikube_staging__kubernetes-admin-key.pem https://127.0.0.1:10257/metrics | grep -v "^#" -c
1915

kube-sheduler metrics count:

curl -ks --cert /etc/kubernetes/pki/wikikube_staging__kubernetes-admin.pem --key /etc/kubernetes/pki/wikikube_staging__kubernetes-admin-key.pem https://127.0.0.1:10259/metrics | grep -v "^#" -c
349

Those a per cluster and control plane obviously so (1915 + 349) * 2 == 4528 per prometheus instance

From the kube-scheduler there is also the /metrics/resources endpoint exposing kube_pod_resource_request and kube_pod_resource_limits on a per container basis, but I think we're better of scraping those from T264625: Deploy kube-state-metrics

Event Timeline

JMeybohm moved this task from Incoming 🐫 to ⎈Kubernetes on the serviceops board.
JMeybohm updated the task description. (Show Details)

Change 959249 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Scrape controller-manager and scheduler metrics

https://gerrit.wikimedia.org/r/959249

Change 959257 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] prometheus::k8s: Scape scheduler and controller-manager

https://gerrit.wikimedia.org/r/959257

Change 959249 merged by JMeybohm:

[operations/puppet@production] k8s: Scrape controller-manager and scheduler metrics

https://gerrit.wikimedia.org/r/959249

Change 959257 merged by JMeybohm:

[operations/puppet@production] prometheus::k8s: Scape scheduler and controller-manager

https://gerrit.wikimedia.org/r/959257