Page MenuHomePhabricator

Create Prometheus check for Coal service health
Closed, DeclinedPublic

Description

If for whatever reason it is not working correctly (e.g. writing data to Graphite), we should know immediately and not rely on humans finding it manually when they need the data upon browsing Grafana.

The service uses Scap3 for deployments and systemd for automatic start/restart. But we don't monitor its overall health in any way.

See also:

Event Timeline

Change 608430 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[performance/coal@master] Add Prometheus exporter

https://gerrit.wikimedia.org/r/c/performance/coal/ /608430

Change 608434 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] [WIP] webperf: Scrape coal exporter

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608434

Change 608973 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] [WIP] webperf: Enable prometheus-apache-exporter

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608973

Change 608973 abandoned by Dave Pifke:
[operations/puppet@production] [WIP] webperf: Enable prometheus-apache-exporter

Reason:
The information from this exporter is useful for some forms of troubleshooting, but not for the stated goal of generating alerts if one of the performance.wikimedia.org backends starts throwing errors. I therefore don't see a lot of benefit in collecting these metrics at the moment.

https://gerrit.wikimedia.org/r/608973

Aklapper added a subscriber: dpifke.

Removing inactive task assignee (please do so as part of offboarding processes).

fgiunchedi renamed this task from Create Icinga check for Coal service health to Create Prometheus check for Coal service health.Oct 10 2022, 1:27 PM
fgiunchedi subscribed.

I've renamed the task as we shouldn't be adding new Icinga checks, unless absolutely warranted (e.g. service doesn't / can't export metrics to Prometheus)

Change 608430 abandoned by Krinkle:

[performance/coal@master] Add Prometheus exporter

Reason:

https://gerrit.wikimedia.org/r/608430

Change 608434 abandoned by David Caro:

[operations/puppet@production] [WIP] webperf: Scrape coal exporter

Reason:

No longer relevant

https://gerrit.wikimedia.org/r/608434