We currently have no automated way to alert us when certificates in the private repo generated using cergen are due to expire. We should create an nrpe check to alerts us well in advance of any expiry so we are able to generate new certificates before any impact is observed.
A change has already been drafted as part of T236277 however questions where raised which are best here, below is relevant parts of the conversation from the Gerrit task (volans quoted , me responding)
Also I don't see in the task a specific design of what we're trying to achieve here.
Something to alert us if certs are due to expire, i dont think we have anything in place for this
This would be a single check that will fire saying say 10 certs are
expiring and the day after might re-fire with 11 certs are expiring
(because the error message changed) and so on.
I thought icinga only fired alerts if the Status changed i don't think it fired again if the message changes. otherwise diskspace checks, load or the BGP Status[1] checks would constantly alert
I'm not sure of the usefulness vs noise of such a check instead that a per-certificate check.
I actually think the noise would be much worse with a per certificate checks as many certificates expire on the same day. e.g. we have 112 certs expiring in 343 days.
Also why not checking the live ones instead of those in the repo?
simplicity and ultimately the repo is the source of truth, or should be
We could have a cert in the repo that is not yet/anymore deployed/live but we might keep it around for a bit just in case.
i'm not sure i can think of a valid reason to keep an expired certificate around, especially considering it would still be in the git history
I'd like a bit more of problem statement -> possible solutions -> agreement on a plan in the task tbh.
[1]https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cr3-ulsfo&service=BGP+status