Page MenuHomePhabricator

Switch to new labs puppetmasters
Closed, ResolvedPublic

Description

  • build and test labspuppetmaster1001 and 1002
  • understand and test the frontend/backend/balancing arrangement between the two
  • apply local puppetmaster rename patch to both
  • security audit, update
  • switch base images/new VMs to new puppetmaster
  • switch designate cert-cleaner to clean certs on the new puppetmaster
  • move existing instances to new puppetmasters, update certs
  • commit hiera patches switching puppetmaster
  • switch default enc address in /etc/puppet/hiera.yaml to labspuppetmaster1001
  • search/replace uses of old labs-puppetmaster-eqiad name
  • depuppetize/turn off puppetmaster on labcontrol1001
  • remove dns entries for labs-puppetmaster-eqiad

I'm going to build two new labs puppetmasters:

labspuppetmaster1001 (frontend, worker)
labspuppetmaster1002 (worker only)

These will be built using the normal production puppetmaster profiles (but will use the labs private repo and no puppet repo.) labspuppetmaster1001 will use the service name 'labs-puppetmaster.wikimedia.org'.

Once they're up and running, I'll apply local patches to both puppetmasters, like this:

--- a/hieradata/eqiad.yaml
+++ b/hieradata/eqiad.yaml
@@ -78,7 +78,7 @@ labs_nova_controller: &labsnovacontroller "labcontrol1001.wikimedia.org"
 labs_nova_controller_spare: &labsnovacontrollerspare "labcontrol1002.wikimedia.org"
 
 labs_glance_controller: &labsglancecontroller "labcontrol1001.wikimedia.org"
-labs_puppet_master: &labspuppetmaster "labs-puppetmaster-eqiad.wikimedia.org"
+labs_puppet_master: &labspuppetmaster "labs-puppetmaster.wikimedia.org"
 labs_keystone_host: &labskeystonehost "labcontrol1001.wikimedia.org"
 
 labs_osm_host: "wikitech.wikimedia.org"

That will make the new puppetmaster 'sticky' such that once a given instance is moved to the new master it'll be committed to that master ever after.

Then, I can move new hosts over to the new puppetmaster with:

$ grep 'server = labs-puppetmaster-eqiad.wikimedia.org' /etc/puppet/puppet.conf && sed -i 's/labs-puppetmaster-eqiad.wikimedia.org/labs-puppetmaster.wikimedia.org/g' /etc/puppet/puppet.conf && rm -rf /var/lib/puppet/ssl &&  puppet agent --enable && puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff --waitforcert=10 —certname=`hostname -f` —server=labs-puppetmaster.wikimedia.org

I've tested this in labtest and it should work fine. My only real concern is that we need to make VERY sure that vms can never access the private repo, and that puppet doesn't accidentally copy the private repo onto these new puppetmasters. I'm nervous about this on account of using the same profiles as production...

Event Timeline

Andrew updated the task description. (Show Details)

Change 369951 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] designate: distinguish puppetmaster from controller

https://gerrit.wikimedia.org/r/369951

Change 369952 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] designate: switch to the new puppetmaster

https://gerrit.wikimedia.org/r/369952

Change 369951 merged by Andrew Bogott:
[operations/puppet@production] designate: distinguish puppetmaster from controller

https://gerrit.wikimedia.org/r/369951

Change 369952 merged by Andrew Bogott:
[operations/puppet@production] designate: switch to the new puppetmaster

https://gerrit.wikimedia.org/r/369952

Mentioned in SAL (#wikimedia-operations) [2017-08-04T18:17:32Z] <andrewbogott> switched most cloud instance to new puppetmasters, as per https://phabricator.wikimedia.org/T171786

Change 370250 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] clean up a few more labs-puppetmaster-eqiad refs

https://gerrit.wikimedia.org/r/370250

Change 370251 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] toolschecker: use the new puppetmaster for manifest checks

https://gerrit.wikimedia.org/r/370251

Change 370252 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] shinken: test the new labs puppetmaster

https://gerrit.wikimedia.org/r/370252

Change 370250 merged by Andrew Bogott:
[operations/puppet@production] clean up a few more labs-puppetmaster-eqiad refs

https://gerrit.wikimedia.org/r/370250

Change 370251 merged by Andrew Bogott:
[operations/puppet@production] toolschecker: use the new puppetmaster for manifest checks

https://gerrit.wikimedia.org/r/370251

Change 370252 merged by Andrew Bogott:
[operations/puppet@production] shinken: test the new labs puppetmaster

https://gerrit.wikimedia.org/r/370252

Change 373080 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] labs instances: switch salt-master to labcontrol1001

https://gerrit.wikimedia.org/r/373080

Change 373081 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] remove role::labs::puppetmaster

https://gerrit.wikimedia.org/r/373081

I'm going to leave labcontrol1001 as the salt master. No sense in rebuilding this when we're going to stop using salt soon, and there's no reason to have it coupled to the puppetmaster.

Change 373080 merged by Andrew Bogott:
[operations/puppet@production] labs instances: switch salt-master to labcontrol1001

https://gerrit.wikimedia.org/r/373080

Change 373081 merged by Andrew Bogott:
[operations/puppet@production] remove role::labs::puppetmaster

https://gerrit.wikimedia.org/r/373081

Change 373113 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/dns@master] remove cnames for old labs puppetmasters

https://gerrit.wikimedia.org/r/373113

Mentioned in SAL (#wikimedia-operations) [2017-08-22T17:54:18Z] <andrewbogott> removing obsolete apache2 and puppetmaster packages from labcontrol boxes for https://phabricator.wikimedia.org/T171786

I merged the patch removing puppetmaster from labcontrols. Then, on labcontrol1001, 1002, and labtestcontrol1001 I did the following:

  • apt-get purge apache2 puppetmaster-passenger puppetmaster
  • rm -rf /var/lib/git
  • removed crontab entry that rebased the puppet repos
  • forced a rebuild of puppet.conf (so that now it only has 'agent' and 'main' sections)
  • removed /var/www and /etc/apache2

Change 373113 merged by Andrew Bogott:
[operations/dns@master] remove cnames for old labs puppetmasters

https://gerrit.wikimedia.org/r/373113

Andrew closed subtask Restricted Task as Resolved.Nov 1 2018, 6:24 PM