Page MenuHomePhabricator

prometheus: usable dashboard for meta-metrics about Prometheus itself (query durations etc)
Closed, ResolvedPublic

Description

(part of https://wikitech.wikimedia.org/wiki/Incident_documentation/20190425-prometheus)

The dashboard that does exist seems to date from prometheus 1.x days -- most of the graphs are not populated. The ones that are there are not the most useful. (Part of this task is to figure out which graphs are useful to include.)

There's also no differentiation between each of the backing prometheus servers, which would be useful when debugging prometheus issues.

Event Timeline

Dzahn triaged this task as Medium priority.Apr 30 2019, 9:40 PM

I've imported a Prometheus dashboard with 2.x stats and replaced the previous one: https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server

There's no drill down per prometheus server instance yet, although metrics are there

fgiunchedi claimed this task.

The dashboard at https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server now allows selecting each Prometheus server:port. Tentatively resolving but please reopen if sth is amiss!