Page MenuHomePhabricator

Prevent catalog breakage on cloud instances by decoupling core cloud puppetmaster from custom puppetmasters
Open, MediumPublic

Description

A straw-man proposal:

Each VM definitely knows about one puppet 'core' server which only serves up the base config.

  • no custom classes
  • never a project-local master, only ever a central/standard master that manages all VMs
  • reads user-defined custom hiera, but sanitizes it for a whitelist of settings which can't break the catalog
    • For starters, only settings allowed are: custom_puppet_master, lvs_mount things, cert-switching

Optionally, a given VM can also subscribe to a second 'custom' master, which might be project-local or might be an additional 'central' master.

  • this is almost entirely orthogonal to the 'core' puppet run. Different config dir, different certs, run triggered by a different cron, etc.
  • default VMs don't know about this server at all
  • /only/ serves classes specified by the custom enc, empty by default
  • able to read the entire custom hiera catalog (or possibly an separate unrelated hiera catalog)

Q: what if a config state in the custom config contradicts a config state in the core config?

A: VM state will oscillate between the two depending on which puppet run happened most recently. This won't happen very often (minimal overlap in the scope of the two puppet masters) but when it does it will be the user's problem (NOTE: this is moot if the custom puppet catalog is always a superset of the core catalog)

Q: what will the UI look like for this?

A: To be determined but not necessarily a lot different from the current UI. We'll still want a way to add project-wide custom puppet for some use cases. (Probably the whole puppet UI needs redesigning but most of that is decoupled from this issue)

Q: What about module versioning?

A: I'm pretty sure that versioning is a good idea, and also pretty sure that it's unrelated to the issue that this proposal is meant to fix, which is that even if the base module is rock solid you can still break the catalog by adding literally anything else to the catalog for a given puppet run.

Event Timeline

Andrew renamed this task from Decouple core cloud puppetmaster from custom puppetmasters to Prevent catalog breakage on cloud instances by decoupling core cloud puppetmaster from custom puppetmasters.Jul 1 2019, 8:01 PM
Andrew updated the task description. (Show Details)
  • What will this setup look like to the admins of a Cloud VPS instance? Is it all easy to find from /etc/puppet files or will there be instance based config put in some other location(s)?
  • Will puppet agent -tv run the "core" puppet manifest or the "custom" manifest?
  • Will we be able to reasonably document the files that the "core" manifest manages so that folks will know what to avoid changing in their "custom" manifest?
  • What will this setup look like to the admins of a Cloud VPS instance? Is it all easy to find from /etc/puppet files or will there be instance based config put in some other location(s)?

There will be two configs, e.g. /etc/puppet and /etc/puppetcore.

  • Will puppet agent -tv run the "core" puppet manifest or the "custom" manifest?

puppet agent -tv --config /etc/basepuppet/puppet.conf

and

puppet agent -tv --config /etc/corepuppet/puppet.conf

(Note that I don't have a strong opinion about whether 'core' should be the default or 'custom' should be default... I'm open to suggestions.)

  • Will we be able to reasonably document the files that the "core" manifest manages so that folks will know what to avoid changing in their "custom" manifest?

Yes, for certain values of 'reasonably'. The parent task has a dump of all the things that would be included in core... we should probably try to prune that down a bit, and then we can add inline comments to the puppet repo, make a wiki page, etc.

puppet agent -tv --config /etc/basepuppet/puppet.conf

and

puppet agent -tv --config /etc/corepuppet/puppet.conf

(Note that I don't have a strong opinion about whether 'core' should be the default or 'custom' should be default... I'm open to suggestions.)

I think I would expect that making 'custom' the default would be the least surprising thing for instance maintainers. The muscle memory for puppet agent -tv is pretty strong for those of us who are used to using Puppet, and it sounds like all the interesting things for a instance would actually be orchestrated using the custom puppetmaster.

Thinking about this a bit today, I'm no longer sure that the two puppet catalogs need to be disjoint. If the 'custom' catalog contains the core catalog + the custom bits then we'll avoid some potential oscillation between the two runs and also support having custom classes depend on core classes.

The other advantage of this plan is that it means the 'custom' puppetmaster is now == the existing puppetmaster.

Tentative transition plan A:

  • Move all VMs to the in-cloud puppetmasters (T171188)
  • Create a new set of 'core' puppetmasters, also in the cloud-infra project
  • Add all VMs to the core puppetmasters as well (at which point all VMs are using both masters)
  • Write a horizon UI to turn 'custom puppet' on/off per VM (with it activated for all VMs)
  • Use some kind of scripted logic to uncheck the above box (and detach from the custom master) all default-state VMs

Tentative transition plan B:

  • Create a new set of 'core' puppetmasters, also in the cloud-infra project
  • Add all VMs to the core puppetmasters
  • Write a horizon UI to turn 'custom puppet' on/off per VM (with it activated for all VMs)
  • Check the above box as appropriate
  • For VMs with the box checked, move to in-cloud puppetmasters as appropriate; for those which don't need it, detach from the in-prod puppetmaster

I'm pretty sure that plan A is better, even though it involves doing and then undoing some things. Among other things it lets us close T171188 in the meantime.

For this plan to work at all, we'd have to ensure that there's nothing in the 'core' catalog that actively purges files. Otherwise we'll get oscillating states regardless as the core catalog deletes things and the custom catalog creates things. So we'd need some way of actively auditing the core catalog to prevent disaster.

We need to be very careful about File purge => true resources appearing in the core catalog that can be highly destructive unless puppet takes into account stuff in the custom catalog. E.g. ferm, security::access, apt, exim4, etc.