Page MenuHomePhabricator

Renew certs for mcrouter on all application servers.
Closed, ResolvedPublic

Description

  • All mcrouter certs are maintained using cergen
  • The certs expire on may 31st
  • We need a way to check them via icinga given check_ssl doesn't support client certs.

At the very minimum, we need to regenerate the certs and reissue them.

Event Timeline

So the CA public cert will expire as well at the end of May.

The solution I found is the following:

  1. Disable puppet on all hosts that include class 'mcrouter' for added security
  2. ssh to a puppetmaster frontend and do as follows
# Save a copy of the current mcrouter tree
cp -r /srv/private/modules/secret/secrets/mcrouter/ mcrouter
# Delete the current public certs
find /srv/private/modules/secret/secrets/mcrouter/ -type f -name "*.crt.pem" -delete
# Re-generate the certs
sudo cergen --base-path /srv/private/modules/secret/secrets/mcrouter/  --generate /etc/cergen/mcrouter.manifests.d
# Verify the old CA cert can validate the new certs
openssl verify -verbose -CAFile ./mcrouter/mcrouter_ca/ca.crt.pem /srv/private/modules/secret/secrets/mcrouter/mw1295.eqiad.wmnet/mw1295.eqiad.wmnet.crt.pem
# Verify the new CA cert can validate old certs
openssl verify -verbose -CAFile  /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.crt.pem ./mcrouter/mw1295.eqiad.wmnet/mw1295.eqiad.wmnet.crt.pem
# Verify the expiry of the new certs
openssl x509 -noout -dates -in /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.crt.pem
openssl x509 -noout -dates -in  /srv/private/modules/secret/secrets/mcrouter/mw1295.eqiad.wmnet/mw1295.eqiad.wmnet.crt.pem
# Commit your changes
cd /srv/private && git add modules/secret/secrets/mcrouter/
## sudo -i ; git commit

Then puppet can be reenabled on a single host, and tests can be run to ensure everything works as expected.

@elukey does this make sense to you?

Joe triaged this task as High priority.Apr 18 2019, 11:19 AM

Change 510082 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] mcrouter: feat(T221346) add icinga check for certs

https://gerrit.wikimedia.org/r/510082

Mentioned in SAL (#wikimedia-operations) [2019-05-17T07:57:41Z] <fsero> disabling puppet on mcrouter hosts for T221346

Change 510082 merged by Fsero:
[operations/puppet@production] mcrouter: feat(T221346) add icinga check for certs

https://gerrit.wikimedia.org/r/510082

Change 510810 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] Revert "mcrouter: feat(T221346) add icinga check for certs"

https://gerrit.wikimedia.org/r/510810

Change 510810 merged by Fsero:
[operations/puppet@production] Revert "mcrouter: feat(T221346) add icinga check for certs"

https://gerrit.wikimedia.org/r/510810

Mentioned in SAL (#wikimedia-operations) [2019-05-17T08:24:05Z] <fsero> reenabling puppet after reverting T221346

Change 510891 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] mcrouter: feat(T221346) add icinga check for certs

https://gerrit.wikimedia.org/r/510891

Mentioned in SAL (#wikimedia-operations) [2019-05-17T14:09:33Z] <fsero> second round of setting up cert check, disablign puppet on mcrouter hosts T221346

Change 510891 merged by Fsero:
[operations/puppet@production] mcrouter: feat(T221346) add icinga check for certs

https://gerrit.wikimedia.org/r/510891

Change 510920 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] mcrouter: bug(T221346)

https://gerrit.wikimedia.org/r/510920

Change 510920 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] mcrouter: bug(T221346)

https://gerrit.wikimedia.org/r/510920

Change 510920 merged by Fsero:
[operations/puppet@production] mcrouter: bug(T221346)

https://gerrit.wikimedia.org/r/510920

Mentioned in SAL (#wikimedia-operations) [2019-05-17T14:43:30Z] <fsero> reenabling puppet puppet on mcrouter hosts for T221346, checks in place is there any alert for cert expiration and mcrouter this is the source :)

i tried to reproduce the procedure outlined by @Joe and it won't work as cergen will create a new CA and new certificates. We can rollout the new CA and certs but this would likely create some issue while rolling it out and i'd like to avoid it if possible.

So the ammended one would be:

# Save a copy of the current mcrouter tree
cp -r /srv/private/modules/secret/secrets/mcrouter/ mcrouter
# Delete the current public certs
find /srv/private/modules/secret/secrets/mcrouter/ -type f -name "*.crt.pem" -delete
# go to ca folder
cd /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca
# create a file for extensions which this content and save it as openssl.cnf
[ v3_ca ]
basicConstraints=critical,CA:TRUE,pathlen:0
keyUsage=cRLSign,keyCertSign
subjectKeyIdentifier=hash
# regenerate the CA cert using the same key and csr
openssl x509 -req -extfile openssl.cnf -extensions v3_ca -days 3650 -in /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/mcrouter_ca.csr.pem -signkey /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.key.private.pem -out  /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.crt.pem
cp /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.crt.pem /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/mcrouter_ca.crt.pem
# Re-generate the certs
sudo cergen --base-path /srv/private/modules/secret/secrets/mcrouter/  --generate /etc/cergen/mcrouter.manifests.d
# Verify the old CA cert can validate the new certs
openssl verify -verbose -CAFile ./mcrouter/mcrouter_ca/ca.crt.pem /srv/private/modules/secret/secrets/mcrouter/mw1295.eqiad.wmnet/mw1295.eqiad.wmnet.crt.pem
# Verify the new CA cert can validate old certs
openssl verify -verbose -CAFile  /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.crt.pem ./mcrouter/mw1295.eqiad.wmnet/mw1295.eqiad.wmnet.crt.pem
# Verify the expiry of the new certs
openssl x509 -noout -dates -in /srv/private/modules/secret/secrets/mcrouter/mcrouter_ca/ca.crt.pem
openssl x509 -noout -dates -in  /srv/private/modules/secret/secrets/mcrouter/mw1295.eqiad.wmnet/mw1295.eqiad.wmnet.crt.pem
# Commit your changes, remember to disable puppet on mcrouter fleet before commiting for precaution
cd /srv/private && git add modules/secret/secrets/mcrouter/
## sudo -i ; git commit

cergen doesnt allow AFAICT regenerate a CA cert from a given key and CSR so thats why im doing it manually via openssl.

@Joe @elukey does this looks sane to you?

actually cergen does it, it was a mistake on my part.

Mentioned in SAL (#wikimedia-operations) [2019-05-20T09:01:12Z] <fsero> disabling puppet on mcrouter hosts for regenerating certs - T221346

Mentioned in SAL (#wikimedia-operations) [2019-05-20T09:26:45Z] <fsero> continue to rolling over new certs - T221346

Mentioned in SAL (#wikimedia-operations) [2019-05-20T09:35:58Z] <fsero> rolling over new certs to all mcrouter hosts except proxys - T221346

Mentioned in SAL (#wikimedia-operations) [2019-05-20T10:03:41Z] <fsero> rolling over certs into mcrouter proxies eqiad - T221346

Mentioned in SAL (#wikimedia-operations) [2019-05-20T10:08:55Z] <fsero> rolling over certs into mcrouter proxies codfw - T221346

Mentioned in SAL (#wikimedia-operations) [2019-05-20T10:18:02Z] <fsero> puppet reenabled certs renewed - T221346

Change 511397 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] mcrouter: page 7 days before certs got expired

https://gerrit.wikimedia.org/r/511397

besides minor cleanups this is done

Change 511397 abandoned by Fsero:
mcrouter: page 7 days before certs got expired

https://gerrit.wikimedia.org/r/511397