Page MenuHomePhabricator

Renew puppet cert for etcd.codfw.wmnet
Closed, ResolvedPublic

Description

The Icinga check Puppet CA expired certs on puppetmaster1001 fired today reporting the expiration of the certificate for etcd.codfw.wmnet.

I've verified and it expires notAfter=Feb 26 12:23:55 2022 GMT

Opening this task to avoid forgetting to renew it during working hours.

Related Objects

StatusSubtypeAssignedTask
ResolvedJoe
In ProgressNone

Event Timeline

Volans triaged this task as Medium priority.Feb 19 2022, 12:54 PM
Volans created this task.

I think this is the old etcd certificate we used to use for etcd in codfw; since we've moved to etcd v3 we're using a new cert created with cergen:

$ openssl s_client -host conf2004.codfw.wmnet -port 4001 2>/dev/null | openssl x509 -noout -dates 
notBefore=Apr 25 07:11:49 2021 GMT
notAfter=Apr 25 07:11:49 2026 GMT

I am just going to revoke that certificate on the CA.

I just removed the cert from puppet.

Change 787884 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] add new certificate for etcd-v3.eqiad.wmnet

https://gerrit.wikimedia.org/r/787884

Change 787885 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] hiera: tlsproxy: use new etcd-v3 certificate

https://gerrit.wikimedia.org/r/787885

Change 787885 merged by Dzahn:

[operations/puppet@production] hiera: tlsproxy: use new etcd-v3 certificate

https://gerrit.wikimedia.org/r/787885

Change 787884 merged by Cwhite:

[operations/puppet@production] add new certificate for etcd-v3.eqiad.wmnet

https://gerrit.wikimedia.org/r/787884

06:42 < mutante> we got paged because the etcd cert in eqiad expired. nginx uses that

06:42 < mutante> etcd in codfw had already been converted to use cergen and etcd-v3 certs but eqiad had not
06:43 < mutante> eventually cwhite and myself figured that out and were able to create a new etcd-v3.eqiad cert matching the existing one for codfw but for eqiad and things have recovered now
06:44 < mutante> took us a while though and this should have warned us about expiry..not happen on a Saturday night unexpectedly
06:44 < mutante> will create an incident report but now it's too late here, almost midnight