We are getting multiple (new?) Icinga CRITs for the same thing, that TLS cert for cloudelastic.wikimedia.org expires in 7 days.
But these are Letsencrypt certs and it looks like both renewal period is 7 days and monitoring is set to go CRIT at 7 days.
For some reason one of them recovered shortly after but the others have not and after refreshing all 3 in Icinga they are still CRIT.
This does not seem to be an issue with the actual renewal, we saw at least one of them get a new cert as well, but I think there is at least this to fix here:
- change puppet code so that we don't check the same cert for the same host name on multiple servers? to avoid duplicate alerts?
- change thresholds so there are no races on the day of renewal (btw the new one it just got will expired on Christmas :)
current status is still like in screenshot below
but here is the new cert already, I confirmed that:
[puppetmaster1001:~] $ curl -6 -S -vvv https://cloudelastic.wikimedia.org:9243
* Server certificate:
* subject: CN=cloudelastic.wikimedia.org
* start date: Sep 27 19:00:30 2021 GMT
* expire date: Dec 26 19:00:29 2021 GMT
* subjectAltName: host "cloudelastic.wikimedia.org" matched cert's "cloudelastic.wikimedia.org"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.