Re-think puppet management for deployment-prep
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Joe
	Mar 29 2017, 6:25 AM

Description

Deployment-prep is thought as "staging" area to production, so it includes a lot of the production puppet roles.

This means that anyone willing to change something in puppet (even simple refactors) will be faced with the following issues:

There is no greppable way to find out where that role is applied on within deployment-prep
Hiera is distributed between on-disk hieradata, the horizon UI (in multiple places), and wikitech.

This results in a lot of time lost chasing things around all those places, and a general difficulty at finding out which classes are actually applied there, and how their data is organized (in a different model than in production).

So my proposal would be to actually make deployment-prep work as similarly to production as possible, while allowing changes to be made to deployment-prep to be made by non-opsens (e.g. people who don't have +2 on ops/puppet).

My idea is as follows (10k ft view):

on the deployment-prep puppetmaster, configure a 'staging' environment for puppet, with its own site.pp that will include deployment-prep nodes declaration, instead of production ones. Those will come from a separate repository with its own merge/commit rights, separated from ops/puppet
on the deployment-prep puppetmaster, define a disk-based hiera hierarchy to mimic 1:1 what we have in production, as far as the hierachy goes. This hiera hierarchy will also be in the new repository, thus freed from ops +2 need
We might want to have all nodes in this environment derive from a base node that includes all the labs boilerplate that is needed for openstack/ldap/basic system.

This would allow me as an opsen working on continuous refactors of our production code to quickly know where to modify deployment-prep in order not to break it, and give greater logic and ordering to the hiera definitions for deployment-prep, which is now out of control, see:

https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep
https://github.com/wikimedia/puppet/blob/production/hieradata/labs.yaml
https://github.com/wikimedia/puppet/blob/production/hieradata/labs/deployment-prep/common.yaml
https://github.com/wikimedia/puppet/tree/production/hieradata/labs/deployment-prep/host
And of course the various prefixes in the Horizon UI

I think this would be a mid-level effort (with the largest task being reorganizing that unruly mess of hiera I just described) and real benefits both on the short and long term.

Details

	Subject	Repo	Branch	Lines +/-
	Move scap::sources from role::deployment_server to common	operations/puppet	production	+90 -178

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T294906 Puppet Improvements
Open	None	T285539 Easing pain points caused by divergence between cloudservices and production puppet usecases
Open	None	T161675 Re-think puppet management for deployment-prep
Resolved	Andrew	T243422 Horizon hiera UI: investigate data type handling

Event Timeline

Joe created this task.Mar 29 2017, 6:25 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 29 2017, 6:25 AM

Joe updated the task description. (Show Details)Mar 29 2017, 6:27 AM

Joe added a subscriber: BBlack.

MoritzMuehlenhoff subscribed.Mar 29 2017, 6:47 AM

Volans subscribed.Mar 29 2017, 8:27 AM

hashar mentioned this in T126370: Need a better way of testing puppet patches for contint/integration stuff.Mar 29 2017, 9:32 AM

We might want to have all nodes in this environment derive from a base node that includes all the labs boilerplate that is needed for openstack/ldap/basic system.

That part is covered in site.pp by having labs instance include role::labs::instance:

manifests/site.pp

node default {         
    if $::realm == 'production' {
        include ::standard
    } else {
        # Require instead of include so we get NFS and other
        # base things setup properly
        require ::role::labs::instance
    }
}

I to think the Hiera multiple sources are a source are confusing. Originally OpenStackManager had an interface to let ones define class and variables, then apply a role/variables on a node.

The Hiera namespace on wikitech has been introduced to offer self service for non ops or save ones the trouble of crafting a puppet patch and cherry picking it on a standalone puppet master. That is now better served by the Horizon UI.

An immediate low hanging fruit would probably to move all the content from https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep to puppet.git /hieradata/labs/deployment-prep/common.yaml and leave a comment about using Horizon from now on.

In Horizon, we should probably make people aware that changes there should be transient and eventually land to puppet.git /hieradata. Similar to how puppet master cherry picks eventually get merged in puppet.git once they are ready.

Puppet environments in labs would be quite a great thing. I can imagine project admin having commit rights on something like labs/puppet/deployment-prep.git which would be a standalone puppet hierarchy and be fetched on the puppetmaster under /environments.

@hashar the point is to have something that resembles production, including the role-based hiera lookup.

It is also pretty important in the role/profile paradigm we're trying to enforce.

At the moment the only convenient way to do that is the pupppet-prefix function of Horizon.

Honestly this ticket is not about low-hanging fruits, but about an idea to make that environment manageable without draining energies from people to keep it unbroken when doing puppet refactors/changes aimed at production.

Honestly if avoiding breaking beta could be done with one commit shadowing the hieradata commit in production, and a couple of greps, it would be easier to enforce people to do it, versus the current process that is, more or less:

Go to horizon and cycle through all the VMs in deployment prep that can have the class applied according to your memory/experience
Cycle through all the puppet prefixes defined there too
Search hiera in the 4 places I named before

which takes to an experienced opsen around 2-3 hours even for a pretty simple refactoring, and of course most people don't even get to the next objective that would be

remove from hiera all variables that are now unneeded

so that all in all the deployment-prep hiera is thousands of lines of sometimes-repeated variables.

So yeah removing one source of hiera would help a bit, but not so much tbh. We need a radically different approach and accept deployment-prep is qualitatively different from every other labs project, and treat it accordingly.

on the deployment-prep puppetmaster, define a disk-based hiera hierarchy to mimic 1:1 what we have in production

I think this would need to be a git repo that is tracked in gerrit or diffusion primarily because we can't guarantee that any particular OpenStack VM will be durable. If something happens to the disk image itself or to the labvirt host that currently holds the image we would need to be able to rebuild the puppetmaster and put all the custom config back.

At a high level I like the idea. The use of a project local puppetmaster gives deployment-prep some flexibility that would be difficult to replicate across all of the OpenStack projects via our central puppetmaster. Ideally this 'prod like' management setup would become a Puppet role/profile that any project could use if its use case was sufficiently production like to warrant the additional complexity of managing a project puppetmaster and a dedicated hiera config repo.

Live hacking deployment-prep to implement the concept is likely to be frustrating for both the users of the beta cluster and the engineers working on fine tuning the implementation. I think I would suggest making a new project request for a place to develop the proof of concept for this setup. Once the big kinks are worked out and the parts that can be upstreamed to ops/puppet.git have been merged it would be easier to convert deployment-prep in place.

In T161675#3140664, @bd808 wrote:

on the deployment-prep puppetmaster, define a disk-based hiera hierarchy to mimic 1:1 what we have in production

I think this would need to be a git repo that is tracked in gerrit or diffusion primarily because we can't guarantee that any particular OpenStack VM will be durable. If something happens to the disk image itself or to the labvirt host that currently holds the image we would need to be able to rebuild the puppetmaster and put all the custom config back.

That is exactly the plan. Deployment-prep would have its own puppet repo with site.pp and hieradata.

• bd808 moved this task from Triage to Tracking on the Cloud-Services board.Mar 29 2017, 3:59 PM

on the deployment-prep puppetmaster, configure a 'staging' environment for puppet, with its own site.pp that will include deployment-prep nodes declaration, instead of production ones. Those will come from a separate repository with its own merge/commit rights, separated from ops/puppet

+1 finding node definitions has become tricky since they are no longer stored in ldap and I just spent a bit of time this morning digging for this exact information in various dashboard.

Will this repo just be a different site.pp for beta node definitions + hieradata? Or was something broader envisioned?

on the deployment-prep puppetmaster, define a disk-based hiera hierarchy to mimic 1:1 what we have in production, as far as the hierachy goes. This hiera hierarchy will also be in the new repository, thus freed from ops +2 need

+1 continually re-familiarizing myself with the hiera hierarchy is no fun. I answer questions like why adding a value to the hieradata/role/common/deployment/server.yaml doesn't affect deployment-tin often.

We discussed the role hierarchy on T120165: Implement role based hiera lookups for labs. I'm not clear on if objections there are still valid or ever were valid for just beta (labs labs labs).

For clarification sake, it sounds like this hierarchy would be a repo to which admins of beta cluster would have +2 and would be completely separate and mostly divergent from operations/puppet. It also sounds like this proposal would also necessitate that we no longer use horizon or wikitech for hieradata lookups.

I'm good with all of this. The divergence of hierarchies coupled with the use of online + on-disk hieradata sources is confusing.

We might want to have all nodes in this environment derive from a base node that includes all the labs boilerplate that is needed for openstack/ldap/basic system.

agree with @hashar -- I think we're OK here.

One thing this proposal does not address (which I assume means that we'd be maintaining the status-quo) is deployment-prep puppetmaster cherry-picks. I took a stab at this problem in T135427: Beta puppetmaster cherry-pick process but I back-burnered that project long ago.

In T161675#3140664, @bd808 wrote:

I think I would suggest making a new project request for a place to develop the proof of concept for this setup. Once the big kinks are worked out and the parts that can be upstreamed to ops/puppet.git have been merged it would be easier to convert deployment-prep in place.

I think that if we use a new project for this work we may end up with a new weird third thing that has different problems from both beta and operations puppet but is incompatible with both. The temptation to start with a clean slate is strong, but I think divergent efforts in this area seem to rarely converge, examples are the staging project from 2 years ago and mediawiki-vagrant that has become its own thing.

In T161675#3141154, @thcipriani wrote:

In T161675#3140664, @bd808 wrote:

I think I would suggest making a new project request for a place to develop the proof of concept for this setup. Once the big kinks are worked out and the parts that can be upstreamed to ops/puppet.git have been merged it would be easier to convert deployment-prep in place.

I think that if we use a new project for this work we may end up with a new weird third thing that has different problems from both beta and operations puppet but is incompatible with both. The temptation to start with a clean slate is strong, but I think divergent efforts in this area seem to rarely converge, examples are the staging project from 2 years ago and mediawiki-vagrant that has become its own thing.

I only meant for figuring out how the puppetmaster changes work and puppetizing that upstream; I'm not suggesting building a full environment at all. As you say that way lies madness.

<threadjack>mediawiki-vagrant's puppet was always meant to be its own thing. The use cases for provisioning a full stack development environment have very little overlap with production operations. It would be nice to reuse more common bits from ops/puppet.git, but currently there is no appetite from the maintainers of that repo to produce content that is sharable in any way other than point-in-time copies. I totally understand that as making a modular and reusable collection of Puppet resources is a non-trivial undertaking and would return very little if any benefit to production operations.</threadjack>

In T161675#3141154, @thcipriani wrote:

Will this repo just be a different site.pp for beta node definitions + hieradata? Or was something broader envisioned?

Mostly, yes.

on the deployment-prep puppetmaster, define a disk-based hiera hierarchy to mimic 1:1 what we have in production, as far as the hierachy goes. This hiera hierarchy will also be in the new repository, thus freed from ops +2 need

+1 continually re-familiarizing myself with the hiera hierarchy is no fun. I answer questions like why adding a value to the hieradata/role/common/deployment/server.yaml doesn't affect deployment-tin often.

Well, that's exptected, that is the production hierarchy, I don't think we should use the same files in prod and labs.

We discussed the role hierarchy on T120165: Implement role based hiera lookups for labs. I'm not clear on if objections there are still valid or ever were valid for just beta (labs labs labs).

Those objections are still valid if we want to use a ENC. I'm proposing of completely switching the model here.

One thing this proposal does not address (which I assume means that we'd be maintaining the status-quo) is deployment-prep puppetmaster cherry-picks. I took a stab at this problem in T135427: Beta puppetmaster cherry-pick process but I back-burnered that project long ago.

yeah I wouldn't try to solve that here as well, and environments are not silver bullets for it, either.

In T161675#3140664, @bd808 wrote:

I think I would suggest making a new project request for a place to develop the proof of concept for this setup. Once the big kinks are worked out and the parts that can be upstreamed to ops/puppet.git have been merged it would be easier to convert deployment-prep in place.

I think that if we use a new project for this work we may end up with a new weird third thing that has different problems from both beta and operations puppet but is incompatible with both. The temptation to start with a clean slate is strong, but I think divergent efforts in this area seem to rarely converge, examples are the staging project from 2 years ago and mediawiki-vagrant that has become its own thing.

I think Bryan is proposing to test just the puppetmaster setup in a different project, then moving those changes to deployment-prep.

I agree with that idea.

Also, just to clarify: this is me expressing a real, everyday problem and a proposal for a possible solution.

I don't think it is my responsibility to work on such a project. I am willing to help whoever is up to the task, though.

elukey subscribed.May 18 2017, 8:49 AM

greg moved this task from INBOX to Next on the Release-Engineering-Team board.May 19 2017, 11:56 AM

greg edited projects, added Release-Engineering-Team (Next); removed Release-Engineering-Team.

hashar mentioned this in T170944: RFC: What to do about wikitech per-project puppet config?.Jul 18 2017, 10:28 PM

• mmodell awarded a token.Aug 23 2017, 4:45 PM

Krenair subscribed.Aug 30 2017, 8:07 PM

• chasemp triaged this task as Medium priority.Sep 13 2017, 2:28 PM

• chasemp subscribed.

hashar unsubscribed.Sep 13 2017, 7:00 PM

EddieGP subscribed.Mar 31 2018, 1:11 AM

EddieGP moved this task from To Triage to Epics / Tracking on the Beta-Cluster-Infrastructure board.Apr 4 2018, 10:34 PM

EddieGP mentioned this in T146627: Make deployment-prep puppetmaster more similar to Production puppetmaster.May 27 2018, 10:06 AM

Krinkle mentioned this in T196034: Define scap::sources in a way that is shared between prod and beta.May 31 2018, 3:49 PM

@Joe I support the idea of not allowing sharing of role-related hieradata between prod and deployment-prep, if it means making everything a lot simpler and centralised. However, we still need a strategy for things that can and should be shared between prod and deployment-prep.

As starting point, and example use case, I would like us to figure out how and where to define the scap::sources variable. This is a simple map of repo names that is stateless and not in any way specific to production or an individual cluster. It should be apply everywhere by default, without the need to duplicate it. We may want to allow overriding or merging in some way at run-time (like we do with various HHVM/PHP setting arrays as well), but the bulk of it should be shared by default.

gerritbot mentioned this in T195314: Set up webperf-1 node in Beta Cluster.May 31 2018, 3:59 PM

Change 436581 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] Move scap::sources from role::deployment_server to common

https://gerrit.wikimedia.org/r/436581

gerritbot added a project: Patch-For-Review.May 31 2018, 4:30 PM

Change 436581 abandoned by Krinkle:
Move scap::sources from role::deployment_server to common

https://gerrit.wikimedia.org/r/436581

Krinkle removed a project: Patch-For-Review.Jun 6 2018, 2:30 AM

Joe mentioned this in T210497: Puppet fail on deployment-mediawiki-07, missing private hiera variable.Nov 28 2018, 8:15 AM

• GTirloni subscribed.Nov 28 2018, 8:46 AM

fgiunchedi subscribed.Nov 28 2018, 9:16 AM

CDanis subscribed.Nov 28 2018, 1:19 PM

Paladox subscribed.Nov 28 2018, 2:35 PM

• brennen subscribed.Mar 7 2019, 6:37 PM

• GTirloni unsubscribed.Mar 21 2019, 9:11 PM

Krinkle unsubscribed.Mar 21 2019, 11:30 PM

• LarsWirzenius subscribed.May 8 2019, 3:44 PM

• Phabricator_maintenance edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team (Next).Jun 14 2019, 9:45 PM

• Phabricator_maintenance moved this task from Should be empty (use Release-Engineering-Team) to Soon-ish on the Release-Engineering-Team-TODO board.Jun 14 2019, 9:46 PM

greg added a project: Release-Engineering-Team.Jun 21 2019, 10:35 PM

greg edited projects, added Release-Engineering-Team (Other / Uncategorized); removed Release-Engineering-Team.Jul 16 2019, 12:50 AM

Andrew closed subtask T243422: Horizon hiera UI: investigate data type handling as Resolved.Feb 7 2020, 4:17 PM

• LarsWirzenius unsubscribed.Feb 20 2020, 3:25 PM

• Mholloway subscribed.Jun 8 2020, 3:15 PM

taavi mentioned this in T277680: Unduplicate beta cluster hiera keys set both in Horizon and in ops/puppet.Mar 18 2021, 7:01 AM

• dpifke subscribed.Mar 19 2021, 10:53 PM

Intermediate proposal: can we give +2 rights on labs/private to everyone with root in deployment-prep?

Waiting for ops to +2 labs/private changes adds unnecessary friction to testing Puppet changes. The alternatives (putting secrets in horizon and/or local-only patches on the puppetmaster) adds to the mess.

In T161675#6930652, @dpifke wrote:

Intermediate proposal: can we give +2 rights on labs/private to everyone with root in deployment-prep?

For anyone wondering who this is, see the "Adminstrators" section of https://openstack-browser.toolforge.org/project/deployment-prep.

Waiting for ops to +2 labs/private changes adds unnecessary friction to testing Puppet changes. The alternatives (putting secrets in horizon and/or local-only patches on the puppetmaster) adds to the mess.

The 3rd option is posting the change to gerrit and then cherry-picking it locally on the deployment-prep puppetmaster. This is not the most beautiful workflow, but it worked pretty well back when I was doing lots and lots of work in deployment-prep. https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Cherry-picking_a_patch_from_gerrit

In T161675#6930689, @bd808 wrote:

For anyone wondering who this is, see the "Adminstrators" section of https://openstack-browser.toolforge.org/project/deployment-prep.

Is there a way for Gerrit to query this via LDAP? (If not, I guess we document adding someone in both places as part of the procedure when adding them in deployment-prep.)

The 3rd option is posting the change to gerrit and then cherry-picking it locally on the deployment-prep puppetmaster. This is not the most beautiful workflow, but it worked pretty well back when I was doing lots and lots of work in deployment-prep. https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Cherry-picking_a_patch_from_gerrit

+1 to this being a nicer option.

In T161675#6930741, @dpifke wrote:

In T161675#6930689, @bd808 wrote:

For anyone wondering who this is, see the "Adminstrators" section of https://openstack-browser.toolforge.org/project/deployment-prep.

Is there a way for Gerrit to query this via LDAP? (If not, I guess we document adding someone in both places as part of the procedure when adding them in deployment-prep.)

Not today no. We do mirror project level membership into LDAP (cn=project-deployment-prep,ou=groups,dc=wikimedia,dc=org), but not the roles that are held by individual accounts. That state data is only available in the OpenStack Keystone database.

In T161675#6930652, @dpifke wrote:

Intermediate proposal: can we give +2 rights on labs/private to everyone with root in deployment-prep?

Changes to labs/private need to be puppet-merge'd on the production puppetmaster which is why (AIUI) it's limited to just "ops". (I don't know why this is the case, just that it's what we currently do)

In T161675#6930761, @Legoktm wrote:

In T161675#6930652, @dpifke wrote:

Intermediate proposal: can we give +2 rights on labs/private to everyone with root in deployment-prep?

Changes to labs/private need to be puppet-merge'd on the production puppetmaster which is why (AIUI) it's limited to just "ops". (I don't know why this is the case, just that it's what we currently do)

It was made automatic behavior by T228443: Help people remember to merge labs/private git which was a step for T227029: Prevent catalog breakage on cloud instances by decoupling core cloud puppetmaster from custom puppetmasters which stalled out as likely infeasible. Prior to that it was a separate manual step on the prod puppetmasters. I think @Andrew may be able to better explain the whys and if it is a decision that can be reexamined today.

taavi subscribed.Mar 20 2021, 6:34 AM

deployment-spec specific Hiera keys are also kept in the production operations/puppet repository (T277680) for some (for me unknown) reason.

In T161675#6930689, @bd808 wrote:

In T161675#6930652, @dpifke wrote:

Intermediate proposal: can we give +2 rights on labs/private to everyone with root in deployment-prep?

For anyone wondering who this is, see the "Adminstrators" section of https://openstack-browser.toolforge.org/project/deployment-prep.