Page MenuHomePhabricator

Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet
Open, Needs TriagePublic2 Estimated Story Points

Description

New relforge instances need SSL certifications:

  • relforge1003.eqiad.wmnet
  • relforge1004.eqiad.wmnet

Event Timeline

From https://wikitech.wikimedia.org/wiki/Cergen:

cergen production certificate manifest files are checked into Puppet private. On puppetmaster1001, these can be found at /srv/private/modules/secret/secrets/certificates/certificate.manifests.d/. *.certs.yaml files in this directory declare certificates and keys that should be generated and managed, as well as any CAs that will be used to sign those certificates.

There is no relforge specific entry in /srv/private/modules/secret/secrets/certificates/certificate.manifests.d/., whereas there is a /srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key

Additionally, commit ba68e967315f6b14278e70f2861bbdd01d0295ae in /srv/private is where the file /srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key was added. There's no associated ticket but the commit body says

This adds ECDSA certificates with SAN for service discovery for relforge.

So:

  • This cert is an internal cert and thus we'll want to use cergen to generate/manage it
  • The cert was previously not generated with cergen, so far the only impact I can see is that in https://github.com/wikimedia/puppet/blob/6a6211da149c7075fa689aa72e22bc5e84fb00a7/hieradata/role/common/elasticsearch/relforge.yaml#L19-L35 we will need to change certificate_name from relforge.svc.eqiad.wmnet to relforge.discovery.wmnet (because rather than living in /srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key it will now live in /srv/private/modules/secret/secrets/ssl/relforge.discovery.wmnet.key)
  • Make sure the new cert carries over the same alt_names
  • Look into if we actually need the hostnames on the cert itself...would make things easier if not
  • Finally, regardless of the hostnames thing, make sure to circle back and revoke the cert on the puppetmaster as well as delete /srv/private/modules/secret/secrets/ssl/relforge.discovery.wmnet.key from /srv/private entirely

Anyway the above context being gathered, I think the steps will look something like this:

  • Figure out if we care about having the hostnames (relforge100[3,4].eqiad.wmnet in this case) on the cert or not
  • (1) Create /srv/private/modules/secret/secrets/certificates/certificate.manifests.d/relforge.certs.yaml per https://wikitech.wikimedia.org/wiki/Cergen#Cheatsheet and make sure we set alt_names to the same value it's used in the old relforge.svc.eqiad.wmnet cert (besides any new additions for relforge100[3,4] and removing references to relforge100[1,2]) and (2) generate and commit the files
  • Patch operations/puppet's hieradata/role/common/elasticsearch/relforge.yaml lines 19-35 to change certificate_name from relforge.svc.eqiad.wmnet to relforge.discovery.wmnet (because rather than living in /srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key it will now live in /srv/private/modules/secret/secrets/ssl/relforge.discovery.wmnet.key)

(After the above 3 steps, make sure that the new cert works before proceeding to the final step)

  • Revoke the cert on the puppetmaster as well as delete /srv/private/modules/secret/secrets/ssl/relforge.discovery.wmnet.key from /srv/private entirely (making sure to add/commit ofc)
Gehel reopened this task as Open.

Mentioned in SAL (#wikimedia-operations) [2021-03-19T02:43:25Z] <ryankemper> T275885 Revoking current relforge TLS cert in advance of generation of new cert: ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean relforge.svc.eqiad.wmnet

New cergen-based manifest (modules/secret/secrets/certificates/certificate.manifests.d/relforge.certs.yaml) to generate relforge.svc.eqiad.wmnet:

relforge.svc.eqiad.wmnet:
  authority: puppet_ca
  expiry: null
  alt_names: ["relforge.svc.eqiad.wmnet","relforge1003.eqiad.wmnet","relforge1004.eqiad.wmnet"]
  key:
    password: REDACTED
    algorithm: ec
ryankemper@puppetmaster1001:/srv/private$ sudo cergen -c 'relforge.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d
2021-03-19 02:55:31,498 INFO     cergen                                   Generating certificates ['relforge.svc.eqiad.wmnet'] with force=False
2021-03-19 02:55:31,498 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating all files, force=False...
2021-03-19 02:55:31,500 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating certificate file
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
2021-03-19 02:55:33,004 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating CA certificate file
2021-03-19 02:55:33,005 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating PKCS12 keystore file
2021-03-19 02:55:33,285 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating Java keystore file
2021-03-19 02:55:34,365 INFO     Certificate(relforge.svc.eqiad.wmnet)    Importing PuppetCA(puppetmaster1001.eqiad.wmnet_8140) cert into Java keystore
2021-03-19 02:55:35,406 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating Java truststore file with CA certificate PuppetCA(puppetmaster1001.eqiad.wmnet_8140)

Status of certificates ['relforge.svc.eqiad.wmnet']

Certificate(relforge.svc.eqiad.wmnet, authorities=[PuppetCA(puppetmaster1001.eqiad.wmnet_8140)]):
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.key.private.pem: PRESENT (mtime: 2021-03-19T02:55:31.498156)
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.key.public.pem: PRESENT (mtime: 2021-03-19T02:55:31.498156)
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.crt.pem: PRESENT (mtime: 2021-03-19T02:55:33.002155)
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/ca.crt.pem: PRESENT (mtime: 2021-03-19T02:55:33.002155)
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.keystore.p12: PRESENT (mtime: 2021-03-19T02:55:33.018155)
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.keystore.jks: PRESENT (mtime: 2021-03-19T02:55:34.818153)
        /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/truststore.jks: PRESENT (mtime: 2021-03-19T02:55:35.758152)


ryankemper@puppetmaster1001:/srv/private$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        modules/secret/secrets/certificates/certificate.manifests.d/relforge.certs.yaml
        modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/

nothing added to commit but untracked files present (use "git add" to track)
ryankemper@puppetmaster1001:/srv/private$ ls modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/
ca.crt.pem                        relforge.svc.eqiad.wmnet.csr.pem          relforge.svc.eqiad.wmnet.key.public.pem  relforge.svc.eqiad.wmnet.keystore.p12
relforge.svc.eqiad.wmnet.crt.pem  relforge.svc.eqiad.wmnet.key.private.pem  relforge.svc.eqiad.wmnet.keystore.jks    truststore.jks

LGTM

Log

After creating a new manifest and running the cert gen command, we need to copy the newly generated secret key in decrypted form to another location in the /srv/private repo. Then we chown all the new files to make sure they're owned by gitpuppet (it's possible there's a git commit hook that does this for me but I didn't see one so I have just been playing it safe). Finally, we need to copy over the pubkey to the operations/puppet repo.

copy decrypted newly generated key within same /srv/private repo

sudo openssl ec -in modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.key.private.pem -out /srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key [supplied password REDACTED from the manifest]
chown

ryankemper@puppetmaster1001:/srv/private$ sudo chown -Rv gitpuppet:gitpuppet /srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/ && \
> sudo chown -v  gitpuppet:gitpuppet /srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key && echo done || echo fail
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.crt.pem' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.key.public.pem' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/truststore.jks' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.keystore.p12' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/ca.crt.pem' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.csr.pem' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.keystore.jks' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.key.private.pem' from root:root to gitpuppet:gitpuppet
changed ownership of '/srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/' from root:root to gitpuppet:gitpuppet
ownership of '/srv/private/modules/secret/secrets/ssl/relforge.svc.eqiad.wmnet.key' retained as gitpuppet:gitpuppet
done

copy srv/private/modules/secret/secrets/certificates/relforge.svc.eqiad.wmnet/relforge.svc.eqiad.wmnet.crt.pem to operations/puppet repo under files/ssl/relforge.svc.eqiad.wmnet.crt over
(commands not shown because it's just a cross-repo cp and git-review -R)


With the above steps done, we've got our new certificate and associated files everywhere they need to be. Now it's simply a matter of testing that the new certificate takes effect after sudo run-puppet-agent on relforge100[3,4].eqiad.wmnet - see the subsequent comment for those steps

Change 673386 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] relforge: generate new TLS certs

https://gerrit.wikimedia.org/r/673386

This comment was removed by RKemper.

Change 673386 merged by Ryan Kemper:
[operations/puppet@production] relforge: generate new TLS certs

https://gerrit.wikimedia.org/r/673386

Mentioned in SAL (#wikimedia-operations) [2021-03-19T03:26:36Z] <ryankemper> T275885 ryankemper@cumin1001:~$ sudo cumin 'P{relforge*}' 'sudo run-puppet-agent'

I forgot to try running a curl command from inside the analytics network *before* deploying the new cert, so I don't have a good before/after comparison, but curling relforge from within the analytics network hangs indefinitely, which is a good sign (it should reject the cert and return immediately if it's still broken).

ryankemper@stat1004:~$ curl https://relforge1004.eqiad.wmnet
^C

Mentioned in SAL (#wikimedia-operations) [2021-03-19T07:16:22Z] <ryankemper> T275885 ryankemper@cumin1001:~$ sudo cumin 'P{relforge*}' 'sudo run-puppet-agent' (change hadn't been merged when I ran the agent earlier)