Page MenuHomePhabricator

Migrate puppet merges to a cookbook
Open, MediumPublic

Description

Currently puppet changes are deployed via the puppet-merge shell script (which some help of a corresponding Python script). In addition this script also merges changes for the public "private" repository (which holds stub secrets). This script can only be run from puppetmaster1001 to ensure locking consistency.

For Puppet changes it compares the repository state between what is found in git head and what is currently deployed to the Puppet servers. The user is then prompted for verifying these changes and the selected sha1 of the repository state is then written to config-master (and all other puppet servers sync to that sha1 of the repository).

Since we have have locking in Spicerack we can rely on it for merging Puppet changes and move Puppet merges to a cookbook.

Initially we could start with just moving the labsprivate functionality to the cookbook and then puppet.git in a second step.

As a future buildout step this cookbook could also adopt the sync of the private repository as well. It currently syncs via git hooks, which is relatively error-prone.

Event Timeline

Change #1050607 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::puppetserver: skip puppet-merge

https://gerrit.wikimedia.org/r/1050607

Reporting some thoughts from IRC:

10:48  <elukey> Generic question about the future of puppet-merge, I'll write some stuff as brainbounce
10:49  <elukey> Moritz opened https://phabricator.wikimedia.org/T366355, since the long term idea is to move away from puppet-merge and use a cookbook
10:50  <elukey> the main idea, IIUC, is that we want something more structured/integrated/stable than puppet-merge, but also that the script has never been tested on puppetserver nodes
10:50  <elukey> (running the puppet 7 stack)
10:51  <elukey> The other bit is that we'd need to move the puppet private repo commits from puppetmaster1001 to a puppetserver, as part of https://phabricator.wikimedia.org/T368023
10:51  <elukey> so overall the main idea is to move away from the puppet 5 infra, and start migrating people to puppetserver1001
10:51  <elukey> I have some doubts/concerns:
10:53  <elukey> 1) if we move puppet private to puppetserver1001 only, I can already see the use case of SREs trying to also puppet-merge on it. I'd do it, maybe because of a quick read of the long email 
                sent to explain, or just because I am used to commit private and public in the same place
10:54  <elukey> 2) In theory puppet-merge could work on puppetserver, but it was never tested. IIUC John suggested against it, but I don't have any more context if there were big blockers or if it was more 
                to prefer a cookbook. To be clear, I like the cookbook idea, but it may take a while so having puppet-merge usable on puppetserver would (imho) make people less confused.
10:55  <elukey> (so moving both private and public merges away from puppetmaster1001)
10:56  <elukey> To answer to v*olans, I think that we could do private only first, then move puppet-merge to a cookbook, and deprecate puppet-merge (script) completely
10:57  <elukey> if we do it, SREs will have to deal with the fact that we puppet-merge on puppetmaster1001 (old Puppet 5 infra) and private-commit on puppetserver1001 (new infra) for a while
10:57  <elukey> at least, this is my understanding :)
10:57  <elukey> Lemme know if I misunderstood something, and if you have preferences :)
10:59  <elukey> ah I forgot something - the above plan would include https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050607, so we remove puppet-merge from the new puppet 7 infra :D
10:59  <elukey> no accidental mistakes
11:00  <elukey> (or we could replace the script with an echo "please use puppetmaster instead")

Proposed plan:

  • In T368023 we move the private repo to puppetserver1001, and we add a git pre-commit hook config to the repo that warns users about committing on that node. Not 100% sure if the pre-commit hook is the right/best choice, but it could help preventing SREs to keep using puppetmaster1001 for private.
  • We remove puppet-merge from puppetserver1001 (https://gerrit.wikimedia.org/r/1050607), so that it will be impossible for an SRE to make the mistake of executing it on puppetserver1001.
  • We then work on the the new puppet-merge cookbook, and eventually deprecate any use of puppetmaster1001.

Getting back to this after having read more puppet code/configs, I think I have a clearer idea now :)

We can easily split the puppet private vs public tasks, I thought that SREs may have preferred to have everything on one node (private commit and puppet-merge) but I think it was just a wrong assumption. On the safeguard fences, we have:

elukey@puppetserver1001:~$ sudo -i puppet-merge
To ensure consistent locking please run puppet-merge from: puppetmaster1001.eqiad.wment

It is already true that we can merge only on puppetmaster1001 (the designed CA_SERVER node), with the option of puppetmaster2001 if 1001 is down (previous change of config for CA_SERVER) so we shouldn't need to remove puppet-merge from puppetserver nodes. SREs will be protected no matter what, so less concerns on this side.

I've also checked what puppet-merge does behind the scenes, and the gist of it is that puppet-merge.py (the companion Python script) is responsible to update the localhost puppet git repo, and it gets called via ssh forced command from puppetmaster1001 on all nodes. Beside some file locking and other misc things, I don't see anything special on puppetmaster1001 that couldn't run on puppetserver1001. I am not saying that we shouldn't do the cookbook, but that we can probably move forward with removing the puppet-merge use case from the old Puppet 5 infra in parallel and as separate task (so we don't rush the new code, that is not tight to a deadline to deprecate Puppet 5).

Change #1050607 abandoned by Elukey:

[operations/puppet@production] role::puppetserver: skip puppet-merge

Reason:

Abandoning this change for the moment

https://gerrit.wikimedia.org/r/1050607

Change #1052260 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] merge_cli: fix a puppet-merge.sh comment

https://gerrit.wikimedia.org/r/1052260

I agree that decoupling makes sense and that it is worth the effort to try and run the current script on the puppetserver nodes.

I've also checked what puppet-merge does behind the scenes, and the gist of it is that puppet-merge.py (the companion Python script) is responsible to update the localhost puppet git repo, and it gets called via ssh forced command from puppetmaster1001 on all nodes. Beside some file locking and other misc things, I don't see anything special on puppetmaster1001 that couldn't run on puppetserver1001.

IIRC, originally there was no restriction and you could edit /srv/private from any of the puppet masters. I think we only changed this once we established the puppet-merge locking mechanism (using flock on local files, of course, and then wanted to truly centralize that).

Change #1052260 merged by Elukey:

[operations/puppet@production] merge_cli: fix a puppet-merge.sh comment

https://gerrit.wikimedia.org/r/1052260

Change #1071620 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::puppetserver: add profile::configmaster

https://gerrit.wikimedia.org/r/1071620

Change #1071620 abandoned by Elukey:

[operations/puppet@production] role::puppetserver: add profile::configmaster

https://gerrit.wikimedia.org/r/1071620

Change #1071834 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::puppetserver: add TLS+HTTP stack to publish SHA1 values

https://gerrit.wikimedia.org/r/1071834

Change #1071885 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Puppet agent: Remove obsolete manage_puppet_ca_file code

https://gerrit.wikimedia.org/r/1071885

Change #1071885 merged by Muehlenhoff:

[operations/puppet@production] Puppet agent: Remove obsolete manage_puppet_ca_file code

https://gerrit.wikimedia.org/r/1071885

Change #1072108 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Puppet frontends: Remove obsolete manage_puppet_ca_file code

https://gerrit.wikimedia.org/r/1072108

Change #1072137 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove obsolete geoip templates

https://gerrit.wikimedia.org/r/1072137

Change #1072142 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] config_master: Explicitly configure the server from which Puppet changes are merged

https://gerrit.wikimedia.org/r/1072142

Change #1072147 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] pontoon: Remove Puppet 5 specific settings no longer relevant

https://gerrit.wikimedia.org/r/1072147

Change #1072137 merged by Muehlenhoff:

[operations/puppet@production] Remove obsolete geoip templates

https://gerrit.wikimedia.org/r/1072137

Change #1072142 merged by Muehlenhoff:

[operations/puppet@production] config_master: Explicitly configure the server for Puppet merges

https://gerrit.wikimedia.org/r/1072142

Change #1072147 merged by Muehlenhoff:

[operations/puppet@production] pontoon: Remove Puppet 5 specific settings no longer relevant

https://gerrit.wikimedia.org/r/1072147

Change #1072171 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add an explicit Hiera variable to determine the active swift ring server

https://gerrit.wikimedia.org/r/1072171

Change #1072108 merged by Muehlenhoff:

[operations/puppet@production] Puppet frontends: Remove obsolete manage_puppet_ca_file code

https://gerrit.wikimedia.org/r/1072108

Change #1072171 merged by Muehlenhoff:

[operations/puppet@production] Add an explicit Hiera variable to determine the active swift ring server

https://gerrit.wikimedia.org/r/1072171

We have migrated puppet merges to puppetserver1001, so this is not a blocker anymore to the shutdown of the Puppet 5 servers. It's still a sensible improvement we should implement at some time, but I'm removing it from the dependency tree of the Puppet 5 server shutdown task.