We are getting multiple (new?) Icinga CRITs for the same thing, that TLS cert for cloudelastic.wikimedia.org expires in 7 days.
But these are Letsencrypt certs and it looks like both renewal period is 7 days and monitoring is set to go CRIT at 7 days.
For some reason one of them recovered shortly after but the others have not and after refreshing all 3 in Icinga they are still CRIT.
This does not seem to be an issue with the actual renewal, we saw at least one of them get a new cert as well, but I think there is at least this to fix here:
- change puppet code so that we don't check the same cert for the same host name on multiple servers? to avoid duplicate alerts?
- change thresholds so there are no races on the day of renewal (btw the new one it just got will expired on Christmas :)
current status is still like in screenshot below
{F34698472}
but here is the new cert already, I confirmed that:
```
[puppetmaster1001:~] $ curl -6 -S -vvv https://cloudelastic.wikimedia.org:9243
```
```
* Server certificate:
* subject: CN=cloudelastic.wikimedia.org
* start date: Sep 27 19:00:30 2021 GMT
* expire date: Dec 26 19:00:29 2021 GMT
* subjectAltName: host "cloudelastic.wikimedia.org" matched cert's "cloudelastic.wikimedia.org"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
```
See also T308908#7957275 for a bit more debugging. It notably shows an Apache 2 worker is not properly restarted after a graceful reload (shows as `no (old gen)` in Apache status) and thus it keeps running with the old certificates.
#upstream Apache 2 is most probably [[https://bz.apache.org/bugzilla/show_bug.cgi?id=63169 | 63169: MPM event, stuck process after graceful: no (old gen) ]] which is in Apache 2.4.49 (we run 2.4.38-3+deb10u7)