Page MenuHomePhabricator

Catch cloud-puppetmasters up with production puppetmaster versions
Open, Needs TriagePublic

Description

jbond just upgraded the production puppetmasters. That probably means we need to upgrade ours as well.

@jbond, can you summarize the state of the art here so we know what to do with the cloud puppetmasters?

Details

Related Gerrit Patches:
operations/puppet : productioncloud-puppetmaster: Prep for new instances
operations/puppet : productioncloud: encapi stuff for new puppetmasters
operations/puppet : productionCopy cloud-puppetmaster hiera to new puppetmasters

Event Timeline

Andrew created this task.Oct 10 2019, 8:51 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 10 2019, 8:51 PM
Andrew added a subscriber: Krenair.Oct 10 2019, 8:51 PM
Krenair added a comment.EditedOct 10 2019, 8:58 PM

I imagine this is a case of replacing them with buster instances? I reckon it should be easy enough to create a new backend, might need a quota bump for the cloudinfra tenant though. Will be interesting to figure out the process to swap frontends.

hi Andrew,

The upgrade is pretty simple and for the most part it just requires the server to be rebuilt with buster as @Krenair suggests. The one thing that is a bit none standard is the introduction of locales for puppet modules. Once a node receives a catalouge from a new puppet master it will expect to be able to manage locale files going forward and if it hits a version 4 puppetmaster (which doesn't support locales) you will see an error.

For us this meant adding a new proxy configured in puppetmaster::web_frontend. however i believe you only have one puppetmaster per site so this is probably not required in your case.

The other issue to be aware of is a conflict between the gitpuppet GID and other groups. This could prevent puppet from running successfully, the manule fix is to update the conflicting group to a duifferent GID. for me debmonitor had GID 998 so i updated that group to have a GID of 997

There are other hacks if you use puppetdb but i dont think you do so i have not gone into them. however let me know if you do need me to expand.

And of course make sure you backup your CA.

Also happy to be available when you upgrade in case i missed something

jbond moved this task from Unsorted 💣 to Watching 👀 on the User-jbond board.

We do have a front-end and a backend-only host in the new setup. Obviously
it's all one site.

I think I'm going to request a quota bump for cloudinfra and make a new
buster instance of the appropriate size to be a new backend.

I think a copy of our CA lives in the cloudinfra internal puppetmaster.

Krenair claimed this task.Thu, Oct 31, 7:34 PM

Change 547691 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Copy cloud-puppetmaster hiera to new puppetmasters

https://gerrit.wikimedia.org/r/547691

Change 547691 merged by Andrew Bogott:
[operations/puppet@production] Copy cloud-puppetmaster hiera to new puppetmasters

https://gerrit.wikimedia.org/r/547691

Change 547807 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] cloud: encapi stuff for new puppetmasters

https://gerrit.wikimedia.org/r/547807

Change 547807 merged by Jhedden:
[operations/puppet@production] cloud: encapi stuff for new puppetmasters

https://gerrit.wikimedia.org/r/547807

Krenair added a comment.EditedSat, Nov 2, 1:24 PM

So I've started making cloud-puppetmaster-04 ready to be a backend, it's got puppet errors about not being able to find the geoipupdate package. The only buster instance in labs with it installed is jbond-pmaster-buster.puppet.eqiad.wmflabs, which gets it from 500 http://deb.debian.org/debian buster/contrib amd64 Packages. Looking at that instance a little more closely that will be coming from this first line of /etc/apt/sources.list: deb http://deb.debian.org/debian/ buster main non-free contrib, however the file appears unpuppetised and on cloud-puppetmaster-04 the line is just deb http://deb.debian.org/debian/ buster main. @jbond, did you add that contrib component yourself or are we missing something? Where do production puppetmasters get it from?

I also see mwv-puppetmaster.mediawiki-vagrant.eqiad.wmflabs gets this component in a /etc/apt/sources.list.d/debian-contrib.list file but I don't see any apt::repository resources in puppet.git that could create such a file.

I also see mwv-puppetmaster.mediawiki-vagrant.eqiad.wmflabs gets this component in a /etc/apt/sources.list.d/debian-contrib.list file but I don't see any apt::repository resources in puppet.git that could create such a file.

I live hacked the package there recently before removing the provisioning of geoipupdate for ::role::puppetmaster::standalone users through T236487: geoipupdate missing on buster on Cloud VPS. Setting puppetmaster::enable_geoip: false in hiera for cloud-puppetmaster-* should do the same thing.

Krenair added a comment.EditedSun, Nov 3, 9:28 PM

I also see mwv-puppetmaster.mediawiki-vagrant.eqiad.wmflabs gets this component in a /etc/apt/sources.list.d/debian-contrib.list file but I don't see any apt::repository resources in puppet.git that could create such a file.

I live hacked the package there recently before removing the provisioning of geoipupdate for ::role::puppetmaster::standalone users through T236487: geoipupdate missing on buster on Cloud VPS. Setting puppetmaster::enable_geoip: false in hiera for cloud-puppetmaster-* should do the same thing.

Thanks, I'll do that for this task. Though I think the question remains about how this works in production (and therefore how it should work in deployment-prep)? Do prod hosts get the contrib component in /etc/apt/sources.list through the installer whereas labs images do not?

Change 547992 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] cloud-puppetmaster: Prep for new instances

https://gerrit.wikimedia.org/r/547992

Anyone have any preferences for how we might transfer /var/lib/puppet/server/ssl/private_keys/*.pem from cloud-puppetmaster-01 to cloud-puppetmaster-03?

Andrew added a comment.Mon, Nov 4, 4:14 AM

Anyone have any preferences for how we might transfer /var/lib/puppet/server/ssl/private_keys/*.pem from cloud-puppetmaster-01 to cloud-puppetmaster-03?

I don't mind if it passes through your local machine on the way, as long as you shred it afterwards. That seems as easy and safe as anything, unless there's for some reason already an ssh keypair between the two hosts.

jbond added a comment.EditedMon, Nov 4, 11:33 AM

So I've started making cloud-puppetmaster-04 ready to be a backend, it's got puppet errors about not being able to find the geoipupdate package. The only buster instance in labs with it installed is jbond-pmaster-buster.puppet.eqiad.wmflabs, which gets it from 500 http://deb.debian.org/debian buster/contrib amd64 Packages. Looking at that instance a little more closely that will be coming from this first line of /etc/apt/sources.list: deb http://deb.debian.org/debian/ buster main non-free contrib, however the file appears unpuppetised and on cloud-puppetmaster-04 the line is just deb http://deb.debian.org/debian/ buster main. @jbond, did you add that contrib component yourself or are we missing something? Where do production puppetmasters get it from?

@Krenair The production host have non-free contrib configured via d-i during installation*. This is not currently managed via puppet but there are plans to do so

*Contrib is implicit with non-free

So I've started making cloud-puppetmaster-04 ready to be a backend, it's got puppet errors about not being able to find the geoipupdate package. The only buster instance in labs with it installed is jbond-pmaster-buster.puppet.eqiad.wmflabs, which gets it from 500 http://deb.debian.org/debian buster/contrib amd64 Packages. Looking at that instance a little more closely that will be coming from this first line of /etc/apt/sources.list: deb http://deb.debian.org/debian/ buster main non-free contrib, however the file appears unpuppetised and on cloud-puppetmaster-04 the line is just deb http://deb.debian.org/debian/ buster main. @jbond, did you add that contrib component yourself or are we missing something? Where do production puppetmasters get it from?

@Krenair The production host have non-free contrib configured via d-i during installation*. This is not currently managed via puppet but there are plans to do so
*Contrib is implicit with non-free

We should probably ensure labs images get the same components installed by default as their production equivalents. @Andrew?

Change 547992 merged by Andrew Bogott:
[operations/puppet@production] cloud-puppetmaster: Prep for new instances

https://gerrit.wikimedia.org/r/547992

Krenair added a comment.EditedSat, Nov 9, 7:24 PM

rsync -ar'd the files from the old puppetmaster, to my machine, then to the new puppetmaster frontend (-03), then shredded my copy of the files. Added the new puppetmasters to the puppetmaster security group so they can actually receive traffic.
Ran echo '172.16.0.38 puppetmaster.cloudinfra.wmflabs.org' >> /etc/hosts on my test instance (krenair-t235218-test.testlabs.eqiad.wmflabs) to get it talking to the new frontend (-03). (putting it on the canary list just tested the backends, i.e. -04)


Sidenote here: Some magic going on here that I should probably write down is that talking to one of our puppetmaster frontends with its eqiad.wmflabs hostname in SNI results in it serving the internal-puppetmaster cert which very few machines will trust, whereas if puppetmaster.cloudinfra.wmflabs.org is the host in SNI then you get the one that instances outside of cloudinfra (and some within IIRC) actually trust as it's signed by the CA they expect:

This one is not trusted outside the cloudinfra hosts that deal with the internal-puppetmaster:

krenair@krenair-t235218-test:~$ openssl s_client -connect cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs:8140 | openssl x509 -noout -text | grep Issuer:
depth=0 CN = cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs
verify error:num=21:unable to verify the first certificate
verify return:1
        Issuer: CN = Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs

This one is trusted by most of labs:

krenair@krenair-t235218-test:~$ openssl s_client -connect cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs:8140 -servername puppetmaster.cloudinfra.wmflabs.org | openssl x509 -noout -text | grep Issuer:
depth=1 CN = Puppet CA: puppet
verify return:1
depth=0 CN = puppetmaster.cloudinfra.wmflabs.org
verify return:1
        Issuer: CN = Puppet CA: puppet

Have a feeling this magic is important to permit frontend-backend communication, where the frontend server itself also does double duty as a backend.


My krenair-t235218-test instance got locale stuff from the new puppetmasters and is continuing to run quite happily, plus no more "Warning: Downgrading to PSON for future requests" there (IIRC this warning is essentially our buster puppet clients running v5 going ಠ_ಠ when encountering our old stretch puppetmasters on v4)

Maybe our next step should be to expand the profile::puppetmaster::frontend::canary_hosts list? Otherwise we can just shift the 185.15.56.64 floating IP to the new frontend instance and cross our fingers, then get to work removing the old instances and config.
Which reminds me, another thing I should probably write down explicitly: Our puppetmaster frontend has a floating IP. This is/was probably because external (read: production) things like Horizon (and probably our various labs cleanup things running on prod-realm boxes) need to talk to it. Stuff running internally to labs get the relevant private IP (which they can actually route to) from the labsaliaser magic in the DNS recursors.

jbond added a comment.Tue, Nov 12, 4:16 PM

@Krenair I have also ran into the certificate issue while looking at labtestpuppetmaster. i have a patch out at the moment which would update the web config to use the certs under puppet config print ssldir --section master instead of puppet config print ssldir --section agent which is currently being used. This will mean that one will need to generate puppetmaster certs signed by the CA they manage. currently;

  • cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs has a client certificate it uses to talk to cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs they:
    • live under /var/lib/puppet/ssl
    • are signed by the CA on cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs
  • cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs is also the CA for [most] clients in openstack
    • Its certificates live under /var/lib/puppet/server/ssl
    • signs certs with the CA from cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs

Currently the web-frontent.erb template points to client certificates in /var/lib/puppet/ssl.

  • on production this works without issue as the CA for both the masters and the clients is the same
  • for the puppet domain name this works fine because the puppet.pem file is copied from the secrets directory into the client folder

The patch i have updates the config to use the ssl files in the /var/lib/puppet/server/ssl directory however there will need to be additional changes to manage/create files under that directory i.e. currently nothing generates or manages the following files

  • /var/lib/puppet/server/ssl/{private_keys,certs}/labtestpuppetmaster2001.wikimedia.org.pem
  • /var/lib/puppet/server/ssl/{private_keys,certs}/cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs .pem
  • etc

We could just comment out [[ this if statment | https://github.com/wikimedia/puppet/blob/production/modules/puppetmaster/manifests/web_frontend.pp#L56 ]] and ensure the files are generated and commited to the private repo like the others. we could also add something like the following for servers which environments which have autosign
We may be able to have something like the following but needs testing


if $realm == 'labs' && $autosign != false {
    exec {'generate master certificates':
      command => "puppet cert generate $server",
      creates => "${facts.puppet_config.master.ssldir}/private_keys/${server}.pem",
  }
}

Sorry this is quite a rabbit whole and i may not have explained myself well so please ask for clarification iof needed or ping me on IRC