I think we can just have a fresh puppet CA and issue new certs everywhere, possible using a bit of cumin magic, unless anyone has any objections, but I would like to check people are fine with this.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Restricted Task | |||||
Resolved | None | T207536 Move various support services for Cloud VPS currently in prod into their own instances | |||
Resolved | Krenair | T171188 Move the main WMCS puppetmaster into the Labs realm | |||
Resolved | Krenair | T219424 Decide how we're going to handle certificates for the puppetmaster migration | |||
Resolved | jbond | T220268 Consider ways to make puppetmaster CA changes smoother on the puppet client end |
Event Timeline
A new CA and certs seems fine to me. The only place that I remember us having puppet cert related issues in the past is Toolforge which I think uses the puppet certs for some non-puppet authn/z controls. Toolforge instances are attached to their own puppetmaster, so that won't be an issue here.
I don't *think* these are reused in other tools off the top of my head like the ones for the tools puppetmaster are. etcd around here uses puppet certs and freaks out when you change them, for instance. Where I see the problem is that the puppetmaster in tools is connected to labs-puppetmaster. Therefore, it's own certs become invalid if you replace the CA.
The tools "standalone" puppetmaster is not actually standalone. It is puppetized on the upstream master while all other clients in tools connect to it instead.
Wait, what? Why is it not set up to serve itself? Still I'm actually not sure changing the central CA would effect tools' own CA?
toolsbeta and tools do this. They depend on some secrets on labs-puppetmaster, possibly. I'm not sure exactly why.
I'd prefer copying over the certs if possible to save cleanup work and similar. Kubernetes also uses puppet certs. Since there's a "general-k8s" project, they may also use them (I haven't checked).
I'm not sure any secrets on labs-puppetmaster would be actually secret, labs central puppetmaster autosigns certificates. Please send me the details of anything cherry-picked (or otherwise added) on top of the usual repos on labs-puppetmaster privately.
I don't see any local commits on a quick check. As I said, I'm not sure exactly why they are built that way. However, they are.
Okay. Someone would have to sign off on exporting the CA private key (which is presumably treated like a prod secret right now) into labs.
general-k8s project appears to have 0 instances.
That helps :) I can't guarantee nobody else is doing something with puppet-as-CA, but that one non-tools example is good to hear that it isn't being used.
I support a new CA if we can somehow preserve the CA of the client puppetmasters, to be clear. I just don't immediately know how doable that is.
It's a good point, I think I'm going to have to test that. I could put in testlabs:
- Central puppetmaster A, probably just set up as a project puppetmaster for the purposes of this test. Puppetmaster: self
- Project puppetmaster B. Puppetmaster: Default central puppetmaster
- Normal instance C. Puppetmaster: B
Then, when everything is set up, set B to use A instead of the current default central puppetmaster and see what happens?
This might be easier to draw on a diagram or something.
So, initial setup:
After:
And we need to check specifically that the C->B interaction still works as one would expect.
Now that I've set up 'New central puppetmaster A' I should probably add that it has a separate puppetmaster to provide it certain secrets e.g. encapi DB credentials and some private keys for the 'puppet' and 'puppetmaster.cloudinfra.wmflabs.org' names, so I should not have added the loop above that entry on the diagram.
This is equivalent to how the existing central puppetmaster works as that will be getting those secrets from the prod puppetmaster, also not pictured.
Anyway, I've run the test using krenair-t219424-b.testlabs.eqiad.wmflabs and krenair-t219424-c.testlabs.eqiad.wmflabs (using the new central puppetmaster I've set up for the parent ticket) and C seems entirely unaffected by B's own puppetmaster changing. They're both running puppet quite happily.
root@krenair-t219424-b:~# cd /var/lib/puppet; mv ssl ssl.$(date '+%Y-%m-%dT%H:%M'); curl https://phab.wmfusercontent.org/file/data/sp3m7a6mjr53xfwlidz7/PHID-FILE-s4vhserqjh34z764hk6s/raw.txt -o /usr/local/share/ca-certificates/Puppet_Internal_CA.crt -s; update-ca-certificates --fresh; puppet agent -tv Clearing symlinks in /etc/ssl/certs... done. Updating certificates in /etc/ssl/certs... 157 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done. Info: Creating a new SSL key for krenair-t219424-b.testlabs.eqiad.wmflabs Info: Caching certificate for ca Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml Info: Creating a new SSL certificate request for krenair-t219424-b.testlabs.eqiad.wmflabs Info: Certificate Request fingerprint (SHA256): BF:89:F3:9B:AA:51:A4:5E:02:1A:D5:B7:87:35:3B:A9:2E:FF:06:57:BC:49:61:05:FE:17:FE:1A:53:09:E3:63 Info: Caching certificate for krenair-t219424-b.testlabs.eqiad.wmflabs Info: Caching certificate_revocation_list for ca Info: Caching certificate for ca Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Info: Caching catalog for krenair-t219424-b.testlabs.eqiad.wmflabs Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files Info: Applying configuration version '1554662935' Info: Computing checksum on file /etc/ssh/userkeys/gitpuppet Info: /Stage[main]/Ssh::Server/File[/etc/ssh/userkeys/gitpuppet]: Filebucketed /etc/ssh/userkeys/gitpuppet to puppet with sum 53b4cfbe73412a1283b34276c60c2fea Notice: /Stage[main]/Ssh::Server/File[/etc/ssh/userkeys/gitpuppet]/ensure: removed Info: Computing checksum on file /usr/local/lib/nagios/plugins/check_puppet-needs-merge Info: /Stage[main]/Nrpe/File[/usr/local/lib/nagios/plugins/check_puppet-needs-merge]: Filebucketed /usr/local/lib/nagios/plugins/check_puppet-needs-merge to puppet with sum 979955f544ce49c3f4a830f967ac991d Notice: /Stage[main]/Nrpe/File[/usr/local/lib/nagios/plugins/check_puppet-needs-merge]/ensure: removed Info: Computing checksum on file /etc/diamond/collectors/CherryPickCounterCollector.conf Info: /Stage[main]/Diamond/File[/etc/diamond/collectors/CherryPickCounterCollector.conf]: Filebucketed /etc/diamond/collectors/CherryPickCounterCollector.conf to puppet with sum 3f34c7d1c551057a7362e91835e75808 Notice: /Stage[main]/Diamond/File[/etc/diamond/collectors/CherryPickCounterCollector.conf]/ensure: removed Notice: openstack::clientpackages::mitaka::stretch: no special configuration yet Notice: /Stage[main]/Openstack::Clientpackages::Mitaka::Stretch/Notify[openstack::clientpackages::mitaka::stretch: no special configuration yet]/message: defined 'message' as 'openstack::clientpackages::mitaka::stretch: no special configuration yet' Notice: The LDAP client stack for this host is: classic Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic' Notice: Applied catalog in 5.01 seconds root@krenair-t219424-b:/var/lib/puppet# grep server /etc/puppet/puppet.conf server = puppetmaster.cloudinfra.wmflabs.org ssldir = /var/lib/puppet/server/ssl/ hostcert = /var/lib/puppet/server/ssl/certs/krenair-t219424-b.testlabs.eqiad.wmflabs.pem hostprivkey = /var/lib/puppet/server/ssl/private_keys/krenair-t219424-b.testlabs.eqiad.wmflabs.pem root@krenair-t219424-b:/var/lib/puppet#
krenair@krenair-t219424-c:~$ sudo puppet agent -tv Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Info: Caching catalog for krenair-t219424-c.testlabs.eqiad.wmflabs Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files Info: Applying configuration version '1554662942' Notice: openstack::clientpackages::mitaka::stretch: no special configuration yet Notice: /Stage[main]/Openstack::Clientpackages::Mitaka::Stretch/Notify[openstack::clientpackages::mitaka::stretch: no special configuration yet]/message: defined 'message' as 'openstack::clientpackages::mitaka::stretch: no special configuration yet' Notice: The LDAP client stack for this host is: classic Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic' Notice: Applied catalog in 3.77 seconds krenair@krenair-t219424-c:~$ grep server /etc/puppet/puppet.conf server = krenair-t219424-b.testlabs.eqiad.wmflabs krenair@krenair-t219424-c:~$
Don't think I understand the gitpuppet/CherryPickCounterCollector stuff on B yet though that's not quite this ticket.
Edit: Yep, got it. B no longer has the puppetmaster role because the new central puppetmaster it talks to is using it's own ENC, which is set up but has not had the data imported from the live one, so everything under it is missing roles/hiera configured in horizon. Due to this role being missing, certain files are absented due to being missing from the catalogue.
root@cloud-puppetmaster-01:~# cat /etc/puppet-enc.yaml host: puppetmaster.cloudinfra.wmflabs.org root@cloud-puppetmaster-01:~# echo 'host: labs-puppetmaster.wikimedia.org' > /etc/puppet-enc.yaml root@cloud-puppetmaster-01:~# /usr/local/bin/puppet-enc krenair-t219424-b.testlabs.eqiad.wmflabs classes: ['role::puppetmaster::standalone'] parameters: {} root@cloud-puppetmaster-01:~# echo 'host: puppetmaster.cloudinfra.wmflabs.org' > /etc/puppet-enc.yaml root@cloud-puppetmaster-01:~# /usr/local/bin/puppet-enc krenair-t219424-b.testlabs.eqiad.wmflabs classes: [] parameters: {}