In T307382, we only noticed that the etcd tlsproxy certificate in eqiad had expired when paged for conf2005/Etcd replication lag. AFAICT, there was no warning that the certificate was near expiring.
Description
Details
Related Objects
- Mentioned Here
- T307382: Modernize etcd tlsproxy certificate management
Event Timeline
Change 788435 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] profile: add etcd tlsproxy certificate monitoring
Change 788435 merged by Dzahn:
[operations/puppet@production] profile: add etcd tlsproxy certificate monitoring
Change 789270 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] etcd::tlsproxy: add monitoring for TLS cert expiration
Change 789270 abandoned by Dzahn:
[operations/puppet@production] etcd::tlsproxy: add monitoring for TLS cert expiration
Reason:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/789176 was already merged instead
monitoring has been added in Icinga and works now:
https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=etcd+tlsproxy
only slight issue I see is we will get 6 alerts at once when the cert gets close to expiry in 1821 minus 60 days.
But on the other hand it checks ecah individual host for having other (non-cert but webserver) issues and would detect if we forget to add a hostname to the cert.
So I guess we can call it resolved.
Cert changes do not notify nginx for a reload. After we left for the evening, two of the hosts still served the old certificate until the reload was performed on the secondary hosts the following morning.
Ideally, we'll move to a more unified certificate monitoring approach. I think this arrangement will be ok until we can adopt that unified solution.
Change 790656 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] P:etcd::tlsproxy: add documentation and fix minor lint issues
Change 790657 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] P:etcd::tlsproxy: move to cfssl pki
Change 790656 merged by Jbond:
[operations/puppet@production] P:etcd::tlsproxy: add documentation and fix minor lint issues