⚓ T171188 Move the main WMCS puppetmaster into the Labs realm

Subject	Repo	Branch	Lines +/-
Move labpuppetmaster1001 and 1002 to role::spare	operations/puppet	production	+2 -2
labpuppetmaster1001/1002: Clean up after moving puppetmasters to the cloud	operations/puppet	production	+10 -16
labs.yaml: remove profile::base::certificates::puppet_ca_content	operations/puppet	production	+0 -33
cloud: Move instances to use new puppetmaster	operations/puppet	production	+35 -2
Make puppetmaster CA content key be a hash keyed by puppetmaster	operations/puppet	production	+4 -3
Make puppetmaster CA content key be a hash keyed by puppetmaster	operations/puppet	production	+4 -3
cloud recursors: alias 'puppet' to the cloud-internal puppetmaster IP	operations/puppet	production	+1 -1
cloud recursors: alias 'puppet' to the new in-labs puppetmaster	operations/puppet	production	+1 -1
cloud: Change monitoring things to look at new puppetmaster	operations/puppet	production	+3 -3
cloud: Switch encapi calls to new puppetmaster	operations/puppet	production	+5 -5
certmanager: Set up config for running inside labs realm	operations/puppet	production	+9 -3
openstack puppetmaster roles: duplicate for set of profiles to be used in labs	operations/puppet	production	+14 -0
openstack puppetmaster profiles: don't include clientpackages	operations/puppet	production	+0 -2
Replace git-sync-upstream on labspuppetmasters, remove from puppet-merge	operations/puppet	production	+6 -19
openstack::puppet::master::encapi: Avoid nginx-apache conflict	operations/puppet	production	+5 -0
profile::puppetmaster::frontend: Allow getting allow_from from hiera	operations/puppet	production	+2 -2
openstack::puppet::master::encapi: work on stretch with python3.5	operations/puppet	production	+7 -3

In T171188#4758203, @Krenair wrote:

In T171188#4758194, @faidon wrote:

JFTR, I don't know what cloudinfra-puppetmaster-01 is. Maybe @Krenair or someone else set up that?

I don't have access to do that. I assume this is a project puppetmaster for either the MX-out or NTP servers that exist in that project.

According to horizon cloudinfra-puppetmaster-01 was created by Andrew in September. It has no signed puppet certs currently, so safe to say nothing is using it as of yet. @Andrew is that something we still need/plan to use, or could we turn down the instance?

As arturo suggests, cloudinfra-puppetmaster-01 is meant to be the puppetmaster for things inside the cloudinfra project. I anticipated us needing that for project-local secrets -- I'm surprised that that e.g. mx-out01 doesn't need it... it certainly will if we add DKIM keys.

So, we can delete the VM if it's confusing people but I'l just need to rebuild it sometime soon :)

Nothing has been done regarding the actual topic of this bug. It's a perfectly reasonable idea but not on the top of the priority list and work there is pending some decisions in other areas.

Ok, clarified then:

cloudinfra-puppetmaster-01 is a puppetmaster server just for the cloudinfra project. Still not in use though.
we will have to discuss if we create a cloudvps-wide puppetmaster inside cloudvps (i.e, a VM).

• GTirloni mentioned this in T177959: Should VPS puppetmasters include labs-recursor0/ns-1 in their resolv.confs?.Nov 21 2018, 7:25 PM

Andrew moved this task from Needs discussion to Inbox on the cloud-services-team (Kanban) board.Jan 8 2019, 4:36 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 9:23 PM

#2 is almost certainly they way to go, as it avoids the weird chicken-egg issue of "we need a labs
puppetmaster to build a labs puppetmaster" -- currently I can't even log into a new VM /at all/ until it's
properly puppetized. So to move forward on this we would need some way of accessing an unpuppetized
VM.

For the record -- T215211 is largely resolved, and with that I'm not longer nearly as worried about worst-case 'we locked ourselves out of everything' scenarios.

bd808 mentioned this in T218448: Volunteer NDA for Alex Monk.Mar 15 2019, 10:05 PM

I'm planning to have a go at this soon.

Krenair mentioned this in T219424: Decide how we're going to handle certificates for the puppetmaster migration.Mar 27 2019, 5:15 PM

Krenair closed subtask T219428: Decide how to handle encapi database as Resolved.Mar 27 2019, 7:25 PM

Change 501581 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] openstack::puppet::master::encapi: work on stretch with python3.5

https://gerrit.wikimedia.org/r/501581

gerritbot added a project: Patch-For-Review.Apr 5 2019, 2:29 PM

Change 501581 merged by Andrew Bogott:
[operations/puppet@production] openstack::puppet::master::encapi: work on stretch with python3.5

https://gerrit.wikimedia.org/r/501581

Change 501587 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] openstack::puppet::master::encapi: Avoid nginx-apache conflict

https://gerrit.wikimedia.org/r/501587

Krenair mentioned this in T153468: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs.Apr 5 2019, 4:36 PM

I've got puppetmaster set up on puppetmaster.cloudinfra.wmflabs.org now, hosted at cloud-puppetmaster-01 with a backend of cloud-puppetmaster-02. A test client on krenair-t171188-test.testlabs.eqiad.wmflabs is working.
Still a load of stuff to do though. It doesn't have a floating IP or any way for the OpenStack hosts to contact it yet. Had to do some manual actions to avoid apache-nginx conflicts, deal with package problems around cergen's dependencies against the openstack-mitaka-jessie repo, work around ferm's AAAA handling bugs, among other things.

Krenair mentioned this in T220268: Consider ways to make puppetmaster CA changes smoother on the puppet client end.Apr 6 2019, 1:59 PM

Change 502235 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] profile::puppetmaster::frontend: Allow getting allow_from from hiera

https://gerrit.wikimedia.org/r/502235

Change 502235 merged by Alexandros Kosiaris:
[operations/puppet@production] profile::puppetmaster::frontend: Allow getting allow_from from hiera

https://gerrit.wikimedia.org/r/502235

Change 501587 merged by Andrew Bogott:
[operations/puppet@production] openstack::puppet::master::encapi: Avoid nginx-apache conflict

https://gerrit.wikimedia.org/r/501587

Krenair added a subtask: T219390: Have puppet-merge on puppetmaster1001 publish the official sha1 after merging.Apr 12 2019, 11:26 PM

Andrew closed subtask Restricted Task as Resolved.May 7 2019, 8:25 PM

Change 509915 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Replace git-sync-upstream on labspuppetmasters, remove from puppet-merge

https://gerrit.wikimedia.org/r/509915

Change 509915 merged by Andrew Bogott:
[operations/puppet@production] Replace git-sync-upstream on labspuppetmasters, remove from puppet-merge

https://gerrit.wikimedia.org/r/509915

Andrew closed subtask T219390: Have puppet-merge on puppetmaster1001 publish the official sha1 after merging as Resolved.May 15 2019, 7:14 PM

Looks like we regressed here while I was busy - logged onto the new puppetmasters to find puppet has been broken for weeks. Seems to be related to clientpackages changes

Change 511875 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] openstack puppetmaster profiles: don't include clientpackages

https://gerrit.wikimedia.org/r/511875

Change 511877 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] openstack puppetmaster roles: duplicate for set of profiles to be used in labs

https://gerrit.wikimedia.org/r/511877

Change 511875 merged by Andrew Bogott:
[operations/puppet@production] openstack puppetmaster profiles: don't include clientpackages

https://gerrit.wikimedia.org/r/511875

Change 511877 merged by Andrew Bogott:
[operations/puppet@production] openstack puppetmaster roles: duplicate for set of profiles to be used in labs

https://gerrit.wikimedia.org/r/511877

Andrew closed subtask T223920: Ensure/confirm a way to shell into unpuppetized VMs as Resolved.May 24 2019, 9:49 PM

Maintenance_bot removed a project: Patch-For-Review.May 28 2019, 3:57 PM

MoritzMuehlenhoff mentioned this in T224549: Track remaining jessie systems in production.May 29 2019, 10:36 AM

Change 514454 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] certmanager: Set up config for running inside labs realm

https://gerrit.wikimedia.org/r/514454

gerritbot added a project: Patch-For-Review.Jun 5 2019, 10:07 AM

Change 514454 merged by Andrew Bogott:
[operations/puppet@production] certmanager: Set up config for running inside labs realm

https://gerrit.wikimedia.org/r/514454

Maintenance_bot removed a project: Patch-For-Review.Jun 5 2019, 6:10 PM

Krenair closed subtask T219424: Decide how we're going to handle certificates for the puppetmaster migration as Resolved.Jun 22 2019, 4:36 PM

encapi works:

diff --git a/modules/openstack/manifests/puppet/master/encapi.pp b/modules/openstack/manifests/puppet/master/encapi.pp
index 509af5e7f8..261e0046e9 100644
--- a/modules/openstack/manifests/puppet/master/encapi.pp
+++ b/modules/openstack/manifests/puppet/master/encapi.pp
@@ -47,7 +47,8 @@ class openstack::puppet::master::encapi(
         ipresolve($designate_host, 4),
         ipresolve($designate_host, 6),
         ipresolve($designate_host_standby, 4),
-        ipresolve($designate_host_standby, 6)]),',')
+        ipresolve($designate_host_standby, 6),
+        '127.0.0.1']),',')
 
     # We override service_settings because the default includes autoload
     #  which insists on using python2

krenair@cloud-puppetmaster-01:~$ curl "http://localhost:8101/v1/test/node/test.asd.codfw.labtest"
hiera: {}
roles: []
krenair@cloud-puppetmaster-01:~$ curl "http://localhost:8101/v1/test/prefix/test.asd.codfw.labtest/hiera" --data '{a: b}' -H 'Content-Type: application/x-yaml'
{status: ok}
krenair@cloud-puppetmaster-01:~$ curl "http://localhost:8101/v1/test/node/test.asd.codfw.labtest"
hiera: {a: b}
roles: []
krenair@cloud-puppetmaster-01:~$ curl "http://localhost:8101/v1/test/prefix/test.asd.codfw.labtest" -X DELETE
{status: ok}
krenair@cloud-puppetmaster-01:~$ curl "http://localhost:8101/v1/test/node/test.asd.codfw.labtest"
hiera: {}
roles: []

bd808 moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.Jul 11 2019, 4:30 PM

Andrew mentioned this in T227029: Prevent catalog breakage on cloud instances by decoupling core cloud puppetmaster from custom puppetmasters.Aug 15 2019, 8:49 AM

Krenair updated the task description. (Show Details)Aug 15 2019, 9:56 AM

Andrew updated the task description. (Show Details)Aug 15 2019, 10:02 AM

Andrew updated the task description. (Show Details)

Krenair updated the task description. (Show Details)Aug 15 2019, 10:27 AM

Krenair updated the task description. (Show Details)

Change 530340 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] cloud: Switch encapi calls to new puppetmaster

https://gerrit.wikimedia.org/r/530340

Change 530341 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud recursors: alias 'puppet' to the new in-labs puppetmaster

https://gerrit.wikimedia.org/r/530341

Krenair updated the task description. (Show Details)Aug 15 2019, 10:36 AM

Andrew updated the task description. (Show Details)Aug 15 2019, 10:37 AM

Krenair updated the task description. (Show Details)Aug 15 2019, 10:40 AM

Andrew updated the task description. (Show Details)Aug 15 2019, 10:41 AM

Andrew updated the task description. (Show Details)Aug 15 2019, 10:50 AM

Krenair updated the task description. (Show Details)Aug 15 2019, 11:01 AM

Krenair updated the task description. (Show Details)Aug 15 2019, 1:16 PM

Change 530382 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] labpuppetmaster1001/1002: Clean up after moving puppetmasters to the cloud

https://gerrit.wikimedia.org/r/530382

Change 533758 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Make puppetmaster CA content key be a hash keyed by puppetmaster

https://gerrit.wikimedia.org/r/533758

Change 530344 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] cloud: Change monitoring things to look at new pupeptmaster

https://gerrit.wikimedia.org/r/530344

Change 530371 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] cloud: Move instances to use new puppetmaster

https://gerrit.wikimedia.org/r/530371

At Wikimania me and Andrew discussed what else needs doing before we push the button here. We realised that for the actual hiera change of puppetmaster plus the profile::base::certificates::puppet_ca_content addition there was a slight problem - some instances have overridden puppetmaster but won't have overrides for profile::base::certificates::puppet_ca_content (the whole idea of that key being to be kept synced with puppetmaster - they should be changed at the same time so puppet can fiddle about with certificates as appropriate). We decided to make it a hash keyed by puppetmaster, which is ignored if the relevant key is not set. I've uploaded that as https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/533758/ and updated https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/530371/ for it.

Mentioned in SAL (#wikimedia-operations) [2019-09-09T17:56:24Z] <andrewbogott> disabling puppet on labpuppetmaster1001 as part of T171188

Change 530340 merged by Andrew Bogott:
[operations/puppet@production] cloud: Switch encapi calls to new puppetmaster

https://gerrit.wikimedia.org/r/530340

Change 530344 merged by Andrew Bogott:
[operations/puppet@production] cloud: Change monitoring things to look at new puppetmaster

https://gerrit.wikimedia.org/r/530344

Change 530341 merged by Andrew Bogott:
[operations/puppet@production] cloud recursors: alias 'puppet' to the new in-labs puppetmaster

https://gerrit.wikimedia.org/r/530341

Change 535275 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud recursors: alias 'puppet' to the cloud-internal puppetmaster IP

https://gerrit.wikimedia.org/r/535275

Change 535275 merged by Andrew Bogott:
[operations/puppet@production] cloud recursors: alias 'puppet' to the cloud-internal puppetmaster IP

https://gerrit.wikimedia.org/r/535275

Change 533758 merged by Andrew Bogott:
[operations/puppet@production] Make puppetmaster CA content key be a hash keyed by puppetmaster

https://gerrit.wikimedia.org/r/533758

Change 535305 had a related patch set uploaded (by Andrew Bogott; owner: Alex Monk):
[operations/puppet@production] Make puppetmaster CA content key be a hash keyed by puppetmaster

https://gerrit.wikimedia.org/r/535305

Change 535305 merged by Andrew Bogott:
[operations/puppet@production] Make puppetmaster CA content key be a hash keyed by puppetmaster

https://gerrit.wikimedia.org/r/535305

Change 530371 merged by Andrew Bogott:
[operations/puppet@production] cloud: Move instances to use new puppetmaster

https://gerrit.wikimedia.org/r/530371

Change 535329 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] labs.yaml: remove profile::base::certificates::puppet_ca_content

https://gerrit.wikimedia.org/r/535329

Change 535329 merged by Andrew Bogott:
[operations/puppet@production] labs.yaml: remove profile::base::certificates::puppet_ca_content

https://gerrit.wikimedia.org/r/535329

Andrew updated the task description. (Show Details)Sep 10 2019, 3:20 AM

Andrew closed subtask T232427: cloud-vps puppet cert cleaner not working properly as Resolved.Sep 10 2019, 4:13 PM

Change 530382 merged by Andrew Bogott:
[operations/puppet@production] labpuppetmaster1001/1002: Clean up after moving puppetmasters to the cloud

https://gerrit.wikimedia.org/r/530382

Maintenance_bot removed a project: Patch-For-Review.Sep 10 2019, 6:11 PM

Andrew closed subtask T232429: Create in-cloud, cloud-vps-wide cumin masters as Resolved.Sep 10 2019, 8:20 PM

Krenair reopened subtask T232429: Create in-cloud, cloud-vps-wide cumin masters as Open.Sep 10 2019, 10:53 PM

Krenair closed subtask T219421: Work out what we're going to do with the cumin master functionality currently on labpuppetmaster1001 as Resolved.Sep 10 2019, 11:04 PM

bd808 mentioned this in T232536: Toolforge Kubernetes internal API down, causing `webservice` and other tooling to fail.Sep 11 2019, 2:00 AM

Mentioned in SAL (#wikimedia-operations) [2019-09-11T12:40:42Z] <moritzm> removing now puppet/puppetdb packages from labpuppetmaster* T171188

Mentioned in SAL (#wikimedia-operations) [2019-09-11T12:40:51Z] <moritzm> removing now obsolete puppet/puppetdb packages from labpuppetmaster* T171188

jbond subscribed.Sep 11 2019, 4:27 PM

Krenair closed subtask T232429: Create in-cloud, cloud-vps-wide cumin masters as Resolved.Sep 11 2019, 8:18 PM

Andrew closed subtask T232428: Resolve local commits on cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs and cloud-puppetmaster-02.cloudinfra.eqiad.wmflabs as Resolved.Sep 13 2019, 3:42 PM

Krenair closed this task as Resolved.Sep 13 2019, 11:51 PM

Krenair closed subtask T232509: cloud-puppetmasters: move some hiera settings from Horizon to git/gerrit as Resolved.

Krenair mentioned this in T232924: Decide what to do with labpuppetmaster100[12].wikimedia.org.Sep 14 2019, 4:18 PM

Change 537111 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move labpuppetmaster1001 and 1002 to role::spare

https://gerrit.wikimedia.org/r/537111

gerritbot added a project: Patch-For-Review.Sep 16 2019, 1:29 PM

Change 537111 merged by Andrew Bogott:
[operations/puppet@production] Move labpuppetmaster1001 and 1002 to role::spare

https://gerrit.wikimedia.org/r/537111

Maintenance_bot removed a project: Patch-For-Review.Sep 16 2019, 2:10 PM

Krenair mentioned this in T139011: puppet function ipresolve unable to look up instance on labs-puppetmaster.Sep 21 2019, 11:58 PM

Move the main WMCS puppetmaster into the Labs realm
Closed, ResolvedPublic
Actions

Description

Details

Related Objects
Search...

Event Timeline

Status	Assigned	Task
		Restricted Task
Resolved	None	T207536 Move various support services for Cloud VPS currently in prod into their own instances
Resolved	Krenair	T171188 Move the main WMCS puppetmaster into the Labs realm
Resolved	Krenair	T219421 Work out what we're going to do with the cumin master functionality currently on labpuppetmaster1001
Resolved	Krenair	T219424 Decide how we're going to handle certificates for the puppetmaster migration
Resolved	jbond	T220268 Consider ways to make puppetmaster CA changes smoother on the puppet client end
Resolved	None	T219428 Decide how to handle encapi database
Resolved	Andrew	T219390 Have puppet-merge on puppetmaster1001 publish the official sha1 after merging
		Restricted Task
Resolved	Andrew	T223920 Ensure/confirm a way to shell into unpuppetized VMs
Resolved	Andrew	T232427 cloud-vps puppet cert cleaner not working properly
Resolved	Krenair	T232428 Resolve local commits on cloud-puppetmaster-01.cloudinfra.eqiad.wmflabs and cloud-puppetmaster-02.cloudinfra.eqiad.wmflabs
Resolved	Andrew	T232429 Create in-cloud, cloud-vps-wide cumin masters
Resolved	Krenair	T232509 cloud-puppetmasters: move some hiera settings from Horizon to git/gerrit

	faidon
	Jul 20 2017, 4:15 PM

Move the main WMCS puppetmaster into the Labs realmClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Move the main WMCS puppetmaster into the Labs realm
Closed, ResolvedPublic
Actions

Related Objects
Search...