Page MenuHomePhabricator

Service puppetmaster1001:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4)
Closed, ResolvedPublic

Description

There is an alert related to puppetmaster1001's 8141 port:

Service puppetmaster1003:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4)

Following the Alert's logs links there is some clue:

target=https://[10.64.16.36]:8141/puppet/v3 msg="Error for HTTP request" err="Get \"https://10.64.16.36:8141/puppet/v3\": x509: certificate relies on legacy Common Name field, use SANs instead"

It seems coming from prometheus1006 mostly, that has 3 days of uptime (matching the start of the alerts).

In puppetmaster::monitoring we have the definition of the HTTP blackbox exporter, and I don't think that it is possible to change puppetmaster1001's TLS cert to have SANs. Running the exporter with GODEBUG="x509ignoreCN=0" should fix the issue, but from a quick glance this is not supported yet in Puppet.

Event Timeline

elukey renamed this task from Service puppetmaster1003:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4) to Service puppetmaster1001:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4).Aug 26 2024, 4:56 PM

Thank you for filing the task, I was also looking at the same failed probes due to the recent Prometheus Bookworm upgrade. tl;dr is I think it is safe to ack these alerts and I will do so, see also https://phabricator.wikimedia.org/T326657#10090776

Mentioned in SAL (#wikimedia-operations) [2024-08-27T07:50:22Z] <godog> ack probedown for puppetmaster:8181 - T373369

fgiunchedi claimed this task.

I'm tentatively resolving since the silences are in place, please feel free to reopen as needed!