Page MenuHomePhabricator

Gaps in kubelet-reported Prometheus metrics
Closed, ResolvedPublic


Some container metrics reported by kubelet are spotty in Grafana dashboards, e.g.

While debugging the issue I tried asking one kubelet every 30s for metrics and indeed not all metrics are reported at every scrape:

while sleep 30; do curl kubernetes1001.eqiad.wmnet:10255/metrics/cadvisor > cadvisor_$(date -Is) ; done
3382 cadvisor_2017-11-28T10:53:51+0000
 847 cadvisor_2017-11-28T10:54:22+0000
3382 cadvisor_2017-11-28T10:54:52+0000

The issue ( ) apparently is fixed in k8s 1.8 and 1.7.7

Event Timeline

Change 393805 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/debs/kubernetes@master] Update to 1.7.11

fgiunchedi moved this task from Doing to Radar on the User-fgiunchedi board.

Change 393805 merged by Alexandros Kosiaris:
[operations/debs/kubernetes@master] Update to 1.7.10

Mentioned in SAL (#wikimedia-operations) [2017-12-06T15:13:02Z] <akosiaris> upload kubernetes_1.7.10-1_amd64 on T181489

Packages upgraded throughout the production fleet to 1.7.10. All services restarted as well. Let's see if this indeed helped