Page MenuHomePhabricator

Gaps in kubelet-reported Prometheus metrics
Closed, ResolvedPublic

Description

Some container metrics reported by kubelet are spotty in Grafana dashboards, e.g. https://grafana.wikimedia.org/dashboard/db/kubernetes-pods.

While debugging the issue I tried asking one kubelet every 30s for metrics and indeed not all metrics are reported at every scrape:

while sleep 30; do curl kubernetes1001.eqiad.wmnet:10255/metrics/cadvisor > cadvisor_$(date -Is) ; done
3382 cadvisor_2017-11-28T10:53:51+0000
 847 cadvisor_2017-11-28T10:54:22+0000
3382 cadvisor_2017-11-28T10:54:52+0000

The issue ( https://github.com/kubernetes/kubernetes/issues/50151 ) apparently is fixed in k8s 1.8 and 1.7.7 https://github.com/kubernetes/kubernetes/pull/51473#issuecomment-330019449

Event Timeline

Change 393805 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/debs/kubernetes@master] Update to 1.7.11

https://gerrit.wikimedia.org/r/393805

fgiunchedi moved this task from Doing to Radar on the User-fgiunchedi board.

Change 393805 merged by Alexandros Kosiaris:
[operations/debs/kubernetes@master] Update to 1.7.10

https://gerrit.wikimedia.org/r/393805

Mentioned in SAL (#wikimedia-operations) [2017-12-06T15:13:02Z] <akosiaris> upload kubernetes_1.7.10-1_amd64 on apt.wikimedia.org/stretch-wikimedia/main T181489

Packages upgraded throughout the production fleet to 1.7.10. All services restarted as well. Let's see if this indeed helped