Page MenuHomePhabricator

Make changing puppetmasters for Labs instances more easy
Closed, ResolvedPublic

Description

When the puppetmaster server is changed for a Labs instance, for example to a standalone puppetmaster, the directory /var/lib/puppet/ssl needs to be removed so that certificates & Co. can be regenerated (cf. https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster#Step_2:_Setup_a_puppet_client, https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Switch_to_new_puppetmaster).

modules/base/manifests/init.pp already has code to do that on the condition of /root/allowcertdeletion existing as part of a scheme to move VMs from one Labs puppetmaster to another (introduced by 3b9987677d7a632fd1b2dd9ccf4454275cbeabf2).

This could be used and improved for the general case. I don't see a security risk to delete the certificates & Co. if someone changed the Hiera variable puppetmaster. Anyone who can do that is root on that instance and can change whatever he wants anyway.

Deleting /var/lib/puppet/ssl completely on some change is very heavy-handed. This could be fine-tuned because for clients of the Labs puppetmaster, openssl x509 -in /var/lib/puppet/ssl/certs/$(hostname -f).pem -text -noout includes:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 7369 (0x1cc9)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=Puppet CA: virt1000.wikimedia.org
        Validity
            Not Before: Nov 30 20:43:45 2016 GMT
            Not After : Nov 30 20:43:45 2021 GMT
        Subject: CN=toolsbeta-valhallasw-puppet-compiler-3.toolsbeta.eqiad.wmflabs
[…]

(BTW, virt1000.wikimedia.org is NXDOMAIN) and for clients of standalone puppetmasters:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 9 (0x9)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=Puppet CA: toolsbeta-puppetmaster7.toolsbeta.eqiad.wmflabs
        Validity
            Not Before: Dec 11 03:28:57 2016 GMT
            Not After : Dec 11 03:28:57 2021 GMT
        Subject: CN=toolsbeta-clush-master-01.toolsbeta.eqiad.wmflabs
[…]

So I suggest to delete those files under /var/lib/puppet/ssl automatically that do not reference the puppetmaster as specified by the Hiera variable.

Workaround

See https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster:

Agent:

rm -fR /var/lib/puppet/ssl

Master:

puppet cert clean <fqdn of instance>

Agent:

puppet agent -tv
mkdir -p /var/lib/puppet/client/ssl/certs
cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs
puppet agent -tv

Event Timeline

bd808 triaged this task as Lowest priority.May 4 2017, 3:35 PM
bd808 subscribed.

Patches are of course always welcome, but this seems like a pretty delicate operation to perform via Puppet for what is in reality a seldom used edge case. The manual steps needed are well described at https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster. I could see how this process would be tedious if a user was converting a project with a large number of VMs, but in practice our typical project is 1-3 VMs. I just did this work to switch the Striker project twice (first from an old broken puppet::self master to the Labs master and then to a new role::puppetmaster::standalone master) and the steps this would automate took about 10 minutes of the total process for my 3 instance project.

bd808 raised the priority of this task from Lowest to Low.Jul 12 2017, 6:10 AM

Workaround

See https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster:

Agent:

$ sudo -i puppet agent -tv
$ sudo rm -fR /var/lib/puppet/ssl
$ sudo rm /var/lib/puppet/server/ssl/ca/signed/$(hostname -f).pem
$ sudo -i puppet agent -tv

Master:

$ sudo puppet cert clean <fqdn of instance>

Agent:

$ sudo -i puppet agent -tv

I have updated the workaround using the one I originally wrote on T148929. The proposed one did not work for me on CI instances with a self puppet master.

Andrew subscribed.

The new arrangement with puppetmaster::standalone is quite a bit better. Changing the puppetmaster is a single hiera setting, and the subsequent error messages provide adequate instructions about how to refresh keys. I'm going to close this as automating things further (I think) break puppet's security model. Feel free to re-open if you disagree :)

That is still broken. On project having a puppetmaster, any new instance ends up with a broken Puppet. The reason is firstboot.sh running puppet using the 'puppet' server:

master="puppet"
sed -i "s/_MASTER_/${master}/g" /etc/puppet/puppet.conf

puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff --waitforcert=10 --certname=integration-slave-docker-1001.integration.eqiad.wmflabs --server=puppet
Creating a new SSL certificate request for integration-slave-docker-1001.integration.eqiad.wmflabs

/root/allowcertdeletion seems indeed to be the way to go, it triggers:

# Clear master certs if puppet.conf changed
exec { 'delete master certs':
    path        => '/usr/bin:/bin',
    command     => 'rm -f /var/lib/puppet/ssl/certs/ca.pem; rm -f /var/lib/puppet/ssl/crl.pem; rm -f /root/allowcertdeletion',
    onlyif      => 'test -f /root/allowcertdeletion',
    subscribe   => File['/etc/puppet/puppet.conf.d/10-main.conf'],
    refreshonly => true,
}

Right now if you have a project puppetmaster and want to add a new instance to your project, to do it safely, you need to create your instance first, get it hooked up to puppet (a manual and arcane process), and only then start adding classes etc. (@Mholloway ran into this when setting up a new instance in deployment-prep - it matched a prefix that included a class that relied on hiera data only defined by a cherry-pick in labs/private.git on deployment-puppetmaster03, so the instance was bricked)

I wonder if we should have firstboot.sh check what project it's in and choose the puppetmaster that way. Maybe have it check for puppet.{project}.wmflabs.org so projects can CNAME that to their chosen puppetmaster. At least, assuming the master it bootstraps from doesn't need autosigning enabled.

Maybe have it check for puppet.{project}.wmflabs.org so projects can CNAME that to their chosen puppetmaster. At least, assuming the master it bootstraps from doesn't need autosigning enabled.

Is that kind of per-project CNAME easy to create via Horizon? I do not know much about what the DNS management screens offer. I know that we can't do that with instance names unless we make the convention "{project}-puppet" or "puppet-{project}" or something so that the base instance name is globally unique.

Maybe have it check for puppet.{project}.wmflabs.org so projects can CNAME that to their chosen puppetmaster. At least, assuming the master it bootstraps from doesn't need autosigning enabled.

Is that kind of per-project CNAME easy to create via Horizon? I do not know much about what the DNS management screens offer. I know that we can't do that with instance names unless we make the convention "{project}-puppet" or "puppet-{project}" or something so that the base instance name is globally unique.

Sure if {project}.wmflabs.org is a designate domain owned by the project so no {project}.wmflabs.org zone could be created. In some cases people made proxies named after a project (details in T104521). I just temporarily made one to demonstrate:

alex@alex-laptop:~$ dig puppet.deployment-prep.wmflabs.org CNAME

; <<>> DiG 9.10.3-P4-Ubuntu <<>> puppet.deployment-prep.wmflabs.org CNAME
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34339
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;puppet.deployment-prep.wmflabs.org. IN	CNAME

;; ANSWER SECTION:
puppet.deployment-prep.wmflabs.org. 600	IN CNAME deployment-puppetmaster03.deployment-prep.eqiad.wmflabs.

;; Query time: 84 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Fri Jul 13 22:38:00 BST 2018
;; MSG SIZE  rcvd: 132

alex@alex-laptop:~$

I believe we've achieved this with T220268#5275994, the only catch is that you need to make a hiera change like https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=1837662&oldid=1837079 (note how we've changed it to a hash keyed by puppetmaster since I wrote that comment) to tell instances the CA cert they should expect from their new puppetmaster.
Shall we close this?

Krenair claimed this task.

Actually I've just tested a couple of new instance creations with the above, it comes up without needing to do anything special anymore beyond the mandatory signing of the certificate.

Change 539301 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] contint: add puppetmaster CA cert

https://gerrit.wikimedia.org/r/539301

Change 539301 abandoned by Hashar:
contint: add puppetmaster CA cert

Reason:
I hvae dropped the patch since it does not solve anything ;)

https://gerrit.wikimedia.org/r/539301