Page MenuHomePhabricator

Sort out only one ideal hiera mechanism for Cloud VPS
Open, LowPublic

Description

(I said I was going to file this task a few days ago and forgot, sorry)

Right now we have several different methods for this - wikitech, horizon, the puppet.git repo, each having their own cross-project as well as per-instance type definitions. Ideally there would be only one.

Here's an extract from a long discussion in -cloud-admin a few days ago:

Nov 28 16:01:12 <Krenair>	we should determine what our ideal method for managing per-project/instance hiera data for Cloud VPS stuff is
Nov 28 16:01:30 <Krenair>	that may not be something existing, it may be something new, it may be an improved version of an existing thing
Nov 28 16:02:05 <Krenair>	if there's an existing one that's close to what we pick, migrate away from that to the new thing and see how we get on
Nov 28 16:02:18 <Krenair>	then see about migrating the other existing systems into it
Nov 28 16:03:08 <gtirloni>	seems like a sane approach. first step is crucial :)
Nov 28 16:03:14 <bd808>	Krenair: agreed. this needs thought before just making another random change
Nov 28 16:03:29 <Krenair>	right now it sounds to me like the two contenders are:
Nov 28 16:03:32 <Krenair>	a) the current horizon system with some fixes for performance and version control added in
Nov 28 16:03:50 <Krenair>	b) new (sort of): per-project gerrit repositories with something syncing keystone role membership into gerrit ACLs

With the per-project gerrit repositories the existing mechanism we'd be replacing is the hieradata/labs directory in operations/puppet.git. If we went down that route we'd probably then look at merging horizon and wikitech data into there and shutting down those mechanisms too.

There's probably several related tasks here, please list as appropriate.

Event Timeline

Each of the existing 3 mechanisms has problems that keep it from being ideal today:

  • Wikitech's Hiera: namespace is dependent on MediaWiki-extensions-OpenStackManager which has been proposed to be undeployed (T161553)
  • operations/puppet.git requires a +2 and manual merge step by a production root user to make a change live (unless the project is using a project local puppetmaster and maintaining local commits). This does not scale well and can take days/weeks/months to occur.
  • The Horizon puppet dashboard does not provide any audit logs about who made changes and when.

Of the two alternate proposals offered so far by @Krenair, proposal (a) of adding some kind of version control/logging to the Horizon system is probably the least technically challenging. I believe it would be possible to use django-reversion to add history to the existing system. Performance could be greatly improved at the cost of some user friendliness by removing the existing system of showing Puppet modules and instead only exposing the raw YAML editing experience. This is a regression from the existing functionality, but it matches the user experience of the other existing competing systems (git and wikitech).

Proposal (b) would be more technically complex to implement as it would require changes in the Cloud-wide Puppetmasters to provision and update the various git repos created. It would also require similar changes to all project local Puppetmasters. Finally it would require a new integration with Gerrit that ideally would use some sort of reconciliation loop functionality to fix broken broken permissions when real-time project adminship change events were missed or mishandled. Conversely, work towards per-project hiera configuration via git would likely also move us closer to per-project Puppet manifest management as well which could be a very useful feature for Cloud VPS projects generally as managing provisioning of unique services using the shared operations/puppet.git repository suffers from the same scaling problems as managing hiera settings there.

  • The Horizon puppet dashboard does not provide any audit logs about who made changes and when.

Also the performance of the UI.

Of the two alternate proposals offered so far by @Krenair, proposal (a) of adding some kind of version control/logging to the Horizon system is probably the least technically challenging. I believe it would be possible to use django-reversion to add history to the existing system. Performance could be greatly improved at the cost of some user friendliness by removing the existing system of showing Puppet modules and instead only exposing the raw YAML editing experience. This is a regression from the existing functionality, but it matches the user experience of the other existing competing systems (git and wikitech).

Yeah I'd like to avoid maintaining our own system on top of horizon. git handles versioning for us and is our one of our main development/operations tools, so going through gerrit would be a good idea. It may take a bit of automatic integration work though. I don't care much for the Puppet ENC thing, it'd be easier to find the class I want through grep, and add it via hiera, setting the parameters in hiera.

Proposal (b) would be more technically complex to implement as it would require changes in the Cloud-wide Puppetmasters to provision and update the various git repos created. It would also require similar changes to all project local Puppetmasters. Finally it would require a new integration with Gerrit that ideally would use some sort of reconciliation loop functionality to fix broken broken permissions when real-time project adminship change events were missed or mishandled. Conversely, work towards per-project hiera configuration via git would likely also move us closer to per-project Puppet manifest management as well which could be a very useful feature for Cloud VPS projects generally as managing provisioning of unique services using the shared operations/puppet.git repository suffers from the same scaling problems as managing hiera settings there.

Yes. I don't want to give the impression that labs projects would ever cease use of the operations/puppet.git repository's production branch, but some of them may want to build on top of it by adding their own hieradata (and potentially manifests later, let's not worry about that for the moment) in their own repository through their own project members' +2s, and these would not be pulled into prod (without someone going through the process of getting the code merged into the production repository with the existing review process that requires). I believe the existing codesearch tools would help people trying to locate hiera stuff used across all projects (?), without requiring anything additional to make it talk to the backend system currently in use by horizon (could also set up a gerrit automatic submodule thing a la mediawiki/extensions.git).

Of the two alternate proposals offered so far by @Krenair, proposal (a) of adding some kind of version control/logging to the Horizon system is probably the least technically challenging. I believe it would be possible to use django-reversion to add history to the existing system. Performance could be greatly improved at the cost of some user friendliness by removing the existing system of showing Puppet modules and instead only exposing the raw YAML editing experience. This is a regression from the existing functionality, but it matches the user experience of the other existing competing systems (git and wikitech).

Yeah I'd like to avoid maintaining our own system on top of horizon. git handles versioning for us and is our one of our main development/operations tools, so going through gerrit would be a good idea. It may take a bit of automatic integration work though. I don't care much for the Puppet ENC thing, it'd be easier to find the class I want through grep, and add it via hiera, setting the parameters in hiera.

Git is good at version control and auditable change logs. Gerrit is ... the VCS and code review solution that Wikimedia has on premise today, that's about the nicest thing I can say about it. There is cognitive complexity of moving instance configuration to a different system and interface than instance provisioning. Having both in Horizon is in my opinion nicer for the general user's experience than having to explain how and when to move from Horizon to Gerrit and back again. This is a subjective thing for sure and will be very different for users who are more used to the Foundation's production Puppet usage.

Of the two alternate proposals offered so far by @Krenair, proposal (a) of adding some kind of version control/logging to the Horizon system is probably the least technically challenging. I believe it would be possible to use django-reversion to add history to the existing system. Performance could be greatly improved at the cost of some user friendliness by removing the existing system of showing Puppet modules and instead only exposing the raw YAML editing experience. This is a regression from the existing functionality, but it matches the user experience of the other existing competing systems (git and wikitech).

Yeah I'd like to avoid maintaining our own system on top of horizon. git handles versioning for us and is our one of our main development/operations tools, so going through gerrit would be a good idea. It may take a bit of automatic integration work though. I don't care much for the Puppet ENC thing, it'd be easier to find the class I want through grep, and add it via hiera, setting the parameters in hiera.

Git is good at version control and auditable change logs. Gerrit is ... the VCS and code review solution that Wikimedia has on premise today, that's about the nicest thing I can say about it. There is cognitive complexity of moving instance configuration to a different system and interface than instance provisioning. Having both in Horizon is in my opinion nicer for the general user's experience than having to explain how and when to move from Horizon to Gerrit and back again. This is a subjective thing for sure and will be very different for users who are more used to the Foundation's production Puppet usage.

Creation and control of stuff outside the scope of instances themselves (e.g. creating new instances, powering them off/on, managing wmflabs.org DNS, security groups, and so on and so forth) happens in Horizon (users familiar with other cloud systems or even more traditional VPS hosts should be familiar with this), instance-internal configuration is done with Puppet which we manage separately. Sounds simple enough to me though I might not be the average Cloud VPS user. :)

We got one step closer to this as part of T161553: Remove OpenStackManager from Wikitech. The Hiera namespace has been removed from Wikitech and all data that was there migrated to the Horizon interface for managing hiera settings.

Collapsing Horizon's settings and ops/puppet.git settings into one place can really only go in the git->Horizon direction as recognized in "option a)" of @Krenair's original irc thoughts. I guess the open question is do we actually want to do that?