Page MenuHomePhabricator

Clean up check_ssl checks from puppet also covered by blackbox prober
Open, MediumPublic

Description

As a side effect of the parent task we have a few endpoints that are being double-checked (icinga and blackbox prober) by virtue of having explicit checks in puppet and a probes section in service::catalog:

ChartMuseum HTTP certificate expiry
Debmonitor Health Check Expiry
Docker registry HTTPS interface certificate expiry
HTTPS-noc SSL Expiry
HTTPS-peopleweb SSL expiry
LibreNMS HTTPS sl expiry
alerts.wikimedia.org tls expiry
debmonitor.wikimedia.org:PORT CDN SSL Expiry
grafana-next-rw.wikimedia.org tls expiry
grafana-rw.wikimedia.org tls expiry
graphite.wikimedia.org tls expiry
icinga-extmon.wikimedia.org SSL Expiry
icinga.wikimedia.org tls expiry
klaxon.wikimedia.org tls expiry
librenms.wikimedia.org tls expiry
puppetboard.wikimedia.org tls expiry
thanos.wikimedia.org tls expiry

Event Timeline

Change 902785 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] releases: remove Icinga monitoring

https://gerrit.wikimedia.org/r/902785

Change 902801 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peopleweb: replace Icinga with Prometheus monitoring

https://gerrit.wikimedia.org/r/902801

Also doing some of this and for hosts that were not on the list over at T331901#8722272

I was looking at occurences of monitoring::service in services that my team owns. Those use different check commands like check_http and its variations, not check_ssl, but it's all very related.

Change 902802 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb/static_rt: replace Icinga with Prometheus monitoring

https://gerrit.wikimedia.org/r/902802

Change 902802 merged by Dzahn:

[operations/puppet@production] miscweb/static_rt: replace Icinga with Prometheus monitoring

https://gerrit.wikimedia.org/r/902802

Change 903318 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] alertmanager: send sre-collab alerts to -operations and -sre-collab

https://gerrit.wikimedia.org/r/903318

Change 903318 merged by Dzahn:

[operations/puppet@production] alertmanager: send sre-collab alerts to -operations and -sre-collab

https://gerrit.wikimedia.org/r/903318

Change 902801 merged by Dzahn:

[operations/puppet@production] peopleweb: replace Icinga with Prometheus monitoring

https://gerrit.wikimedia.org/r/902801

Change 902785 merged by Dzahn:

[operations/puppet@production] releases: remove Icinga monitoring

https://gerrit.wikimedia.org/r/902785

lmata triaged this task as Medium priority.May 3 2023, 1:12 PM