Page MenuHomePhabricator

Fix cloudmetrics icinga prometheus check
Closed, ResolvedPublic

Description

The name-based virtual host configuration on the cloudmetrics servers is not working correctly and tripping the icinga alarm "Prometheus cloudmetrics1001/labs restarted: beware possible monitoring artifacts"

Cloudmetrics is using the default prometheus apache vhost configuration, but since we have other vhosts defined for the grafana labs instance and no prometheus host name defined it doesn't work as expected.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2020-01-10T20:29:53Z] <jeh> cloudmetrics100[12] schedule downtime until Feb 28 2020 on prometheus check T242460

Change 565113 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] labs prometheus: only bind localhost and update vhost config

https://gerrit.wikimedia.org/r/565113

Change 565113 merged by Jhedden:
[operations/puppet@production] labs prometheus: only bind localhost and update vhost config

https://gerrit.wikimedia.org/r/565113

Change 565125 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] labs prometheus: convert apache config to template

https://gerrit.wikimedia.org/r/565125

Change 565125 merged by Jhedden:
[operations/puppet@production] labs prometheus: convert apache config to template

https://gerrit.wikimedia.org/r/565125

I updated prometheus to only bind on the loopback interface and configured Apache to proxy requests to the servers FQDN to prometheus. These changes sync up the cloudmetrics configuration with production and clears up the icinga errors when checking this service.