Page MenuHomePhabricator

RFC: New puppet code organization paradigm/coding standards
Closed, ResolvedPublic

Description

This proposal is partly the result of the discussions we had at the 2016 TechOps offsite, and of my readings before and after it.

The goal of this proposal is to make an incremental shift in how we organize our puppet code in the tree, to better standardize the code structure and make it simpler to read and to debug. It draws ideas from most puppet best practices and should be immediately applicable to new code we write from now on, without disrupting the existing code.

General principles

  1. The code should be organized in roles, profiles, and modules. Let's see in detail what each of these things will mean.
  2. Modules should be basic units of functionality (e.g. "set up, configure and run HHVM") and must not use classes from other modules, and mostly avoid, wherever possible, to use defines from other modules as well. No hiera call, explicit or implicit, should happen within a module. These rules will ensure the amount of WMF-specific code that makes it into modules is minimal, and improve debugging/refactoring/testing of modules as they don't really depend on each other. Keeping up with the HHVM module example, the base class hhvm is a good example of what should be in a module, while hhvm::admin is a good example of what should not be in a module; surely not in this one: it is a class to configure apache to forward requests to HHVM, depends mostly on another module (apache, in fact) and also adds ferm rules, which of course requires the WMF-specific network::constants class.
  3. Profiles are collections of resources in modules that represent an high-level functionality ("a webserver able to serve mediawiki"), should only have parameters that default to explicit hiera calls. This means that a typical profile will either need all of its hiera variables declared, or to be included by an ENC (like horizon) with its parameters declared. Unless specifically needed, no resource should be added to a profile using the include class method, but with explicit class instantiations. If a profile need another one as a precondition, it must be listed with a require ::profile::foo at the start of the class, but this should be mostly avoided. Most of our current "roles" should be profiles. Following our example, an apache webserver that proxies to a local HHVM installation should be configured via a 'profile::hhvm::web' class; a mediawiki installation served through such a webserver should be configured via a 'profile::mediawiki::web' class.
  4. Roles should only include profiles; A role can include more than one profile, but no conditionals, hiera calls, etc are allowed. So following our example, we should have a 'role::mediawiki::web' that just includes 'profile::mediawiki::web' and 'profile::hhvm::web'. Inheritance can be used between roles, but it is strongly discouraged: for instance, it should be remembered that inheritance will not work in hiera.
  5. Any node in site.pp must include exactly one role, with the "role()" function. No exceptions allowed.
  6. Almost all hiera definitions should go in the "role" heirarchy. Only exceptions can be shared data structures that can be used by many profiles or to feed different modules with their data. Those should go in the common/$site global hierarchy. A good example is, in our codebase, ganglia_clusters, or any list of servers (the memcached hosts for mediawiki, for example).
  7. Per-host hiera should only be used to allow tweaking some knobs for testing or to maybe declare canaries in a cluster. It should not be used to add/subtract functionality. If you need to do that, add a new role and configure it accordingly, within its own hiera namespace.
  8. hiera keys SHOULD try to reflect the most specific common shared namespace of the puppet classes trying to look them up (implicitly or explicitly). This should allow easier grepping and avoid conflicts. Global variables SHOULD BE avoided as the profile will by definition be the least specific common namespace.

Expected advantages
This will have a series of advantages, namely:

  • Make it easier to write general-purpose, non-wmf-specific modules
  • Ease of debug: it will be easy to see where something happens (typically, the profiles that are part of your role), hiera calls will all be explicit in the code and mostly contained into a single file (the one for the role declared in site.pp). It will thus be pretty easy to track down a variable lookup
  • Using configurable profiles will allow easier reuse of the same code when we have more than one cluster, see e.g. the use of our current role::elasticsearch in different practical roles
  • Labs and production roles might differ without harm; so for instance a production role might include standard and base::firewall, while a labs role might include some different classes.

On the other hand, some drawbacks have to be expected:

  • a stricter code structure will sometimes present its hurdles and make it harder to solve some problems
  • given the limits we just imposed, sometimes it will be hard to abstract some common code to a separate class and this might cause some more repetitions.
  • The distinctions between what is a profile and what is not can be ambiguous and create some controversy.

A practical example

If you look at the role::elasticsearch::* roles, and you rename role::elasticsearch::common to profile::elasticsearch::common, you'll have a pretty good idea of what to expect. Also, the old-style role::etcd can be compared to the newer role::etcd::{common,networking,kubernetes} that were written following more or less the role/profile pattern. It's easy to see how the old-fashioned role::etcd can be molded easily in deriving from the "profile" (role::etcd::common), while accomodating all the use-cases with the original role::etcd would have required adding more and more parameters, and using regexes to distinguish between roles.

Also, since swift has always been considered a benchmark for the complexity of our manifests, an example change molding the swift roles into using the proposed standard has been posted to gerrit and refers to this task.

Event Timeline

Change 314829 had a related patch set (by Giuseppe Lavagetto) published:
swift: refactor to role/profile pattern, part 1

https://gerrit.wikimedia.org/r/314829

  • Does this mean that a single role can no longer be used by both labs and production?
  • Modules 'must not use classes from other modules': There are exceptions to this right? stdlib?
  • I'm not sure I like the idea of adding another WMF 'special' module prefix profile. Can we consider just using multiple module paths? Then everything is a module, but the distinction between the types of modules is taken care of by the module path.
  • Does this mean that a single role can no longer be used by both labs and production?

No, but it means it will be easy to distinguish the two in case of need without much code rot.

  • Modules 'must not use classes from other modules': There are exceptions to this right? stdlib?

Sorry, my first response here misread the text above: I think classes must NOT be used. Defines and parser functions, on the other hand, are pretty ok, although it's better to limit their usage.

  • I'm not sure I like the idea of adding another WMF 'special' module prefix profile. Can we consider just using multiple module paths? Then everything is a module, but the distinction between the types of modules is taken care of by the module path.

That would be nice, until that introduces a nasty naming clash pretty easily. Also, having modules/profiles is what everyone is doing.

Also, having modules/profiles is what everyone is doing.

Ah interesting, I thought we were making this stuff up. Just read those articles you linked to in the ops email, I understand a little more now.

Correct me if I'm wrong, but it seems like so far our 'roles' are more like what these articles call 'profiles'. Or, at least, that's how I've been using them. I've been using roles to pull together modules and sometimes resources to do a specific function, and then include that role on the nodes that should have that. Example: eventbus and Kafka main cluster brokers run on the same nodes. There is a role::eventbus::eventbus and a kafka::main::broker, and these are both included (via the role function) on nodes in site.pp. In this new profiles world, it sounds like those would both be profile::eventbus and profile::kafka::main::broker (or something like that)...and then a role that includes both of those?

Any node in site.pp must include exactly one role, with the "role()" function. No exceptions allowed.

I'm not so sure about this one. In my example, there is absolutely no requirement that eventbus and kafka main broker run on the same node. They are only included on the same node because it was convenient to colocate them on the same hardware. In fact, these nodes also include role::kafka::main::mirror, which also does not need to be colocated on the same node either. It seems a little silly to make a role that would do nothing but include the profiles for eventbus, kafka main broker, and kafka main mirror, just because we happened to decide to colocate them for convenience.

Whatever the top level abstraction is, it should be a functional unit that is separate and composable. Nodes are a bad top level abstraction, because a single node might run multiple functional units. Making a rule that a node must only include one role effectively makes role == node.

I can currently build profiles (previously known to me as roles) that I can use to mix and match and colocate services in different combinations on different nodes. I do this in labs a lot during development. I often spawn up only a couple of nodes in labs, and include many different profiles on them in a way that I wouldn't do in production. I'm not sure how I would be able to continue to do that if we had a 1:1 node to role rule.

Also, having modules/profiles is what everyone is doing.

Ah interesting, I thought we were making this stuff up. Just read those articles you linked to in the ops email, I understand a little more now.

Correct me if I'm wrong, but it seems like so far our 'roles' are more like what these articles call 'profiles'. Or, at least, that's how I've been using them. I've been using roles to pull together modules and sometimes resources to do a specific function, and then include that role on the nodes that should have that. Example: eventbus and Kafka main cluster brokers run on the same nodes. There is a role::eventbus::eventbus and a kafka::main::broker, and these are both included (via the role function) on nodes in site.pp. In this new profiles world, it sounds like those would both be profile::eventbus and profile::kafka::main::broker (or something like that)...and then a role that includes both of those?

Yes. 1 role: 1 type of node declaration in site.pp.

Any node in site.pp must include exactly one role, with the "role()" function. No exceptions allowed.

I'm not so sure about this one. In my example, there is absolutely no requirement that eventbus and kafka main broker run on the same node. They are only included on the same node because it was convenient to colocate them on the same hardware. In fact, these nodes also include role::kafka::main::mirror, which also does not need to be colocated on the same node either. It seems a little silly to make a role that would do nothing but include the profiles for eventbus, kafka main broker, and kafka main mirror, just because we happened to decide to colocate them for convenience.

Whatever the top level abstraction is, it should be a functional unit that is separate and composable. Nodes are a bad top level abstraction, because a single node might run multiple functional units. Making a rule that a node must only include one role effectively makes role == node.

Yes, that's for two reasons:

  1. site.pp is a nightmare of composite things where edge-cases get easily missed because of if etc. It's best to abstract that function out of site.pp and make it our very dumb node classifier, nothing more
  2. Having 1 node: 1 role allows us to know with absolute certainty where to look for hiera defines; makes the life of the developer actually easier, at the cost of needing to repeat some data in different places in hiera.

I can currently build profiles (previously known to me as roles) that I can use to mix and match and colocate services in different combinations on different nodes. I do this in labs a lot during development. I often spawn up only a couple of nodes in labs, and include many different profiles on them in a way that I wouldn't do in production. I'm not sure how I would be able to continue to do that if we had a 1:1 node to role rule.

In labs we should allow to use multiple roles in the UI; my proposal was just for production where we're not using an ENC.

Hello! thanks for writing up the detailed proposal. There are several concerns / points of confusion for me, so I'll try to lay those out. I think a major point of confusion is perhaps two opposing conclusions, one from a puppet related breakout group I was in (but @Joe wasn't) and vice versa.

To quote the labs puppet breakout group's summary (from https://etherpad.wikimedia.org/p/ops-offsite-2016-labspuppet):

Previous puppet style guide opposed role params -- that's an artifact of the pre-hiera, pre-param Labs config system. Now that we support params in labs config, we'd like the style guide to change to prefer role config via parameters. Those parameters should always be set up with good defaults if undefined. Parameter name and defaults will be used by the Labs puppet UI to discover config options for a given role.

This had fairly decent agreement among those present, and I think @Andrew was going to propose amending this in the guidelines at some point.

However,

Roles should only include profiles; A role can include more than one profile, but no conditionals, hiera calls, etc are allowed.

seems to indicate the exact opposite - assuming the 'hiera calls' includes implicit ones class parameters.

Here are the reasons for wanting to it that way (with params to roles) - from that discussion:

  1. Very easy namespacing - very clear which param is being used by which role. These work the same way as class params, and are namespaced the same - so we can just consider roles to be patterned classes, and be done.
  2. The puppetmaster API allows us to query these in structured fashion - this is what Labs' Horizon does. It queries for list of roles, and allows setting parameters on them via the GUI. This is pretty great for discoverability - and not something that's easy with explicit hiera() calls.
  3. Labs has no role based hiera, so if we want to pass stuff through to classes, it's mostly done via params to roles. We've set sane defaults that work for prod, and allow overrides via hiera for labs. This has worked well for us so far.

So the outcomes of these two seem to be in conflict with each other, and need to be resolved in some form.

Any node in site.pp must include exactly one role, with the "role()" function. No exceptions allowed.

Does this include labs too? We've favored composing at the node level in labs over creating extra roles, since that retains flexibility - especially for users who don't have +2 in operations/puppet.

Almost all hiera definitions should go in the "role" heirarchy. Only exceptions can be shared data structures that can be used by many profiles or to feed different modules with their data. Those should go in the common/$site global hierarchy. A good example is, in our codebase, ganglia_clusters, or any list of servers (the memcached hosts for mediawiki, for example).

Labs doesn't really have a role hierarchy at all, so this doesn't work there. So if all profiles are configured via explicit hiera calls that must be set in hiera if we want to change them, no parameters (+ implicit lookup in roles), this makes life in labs land quite cumbersome.

Labs and production roles might differ without harm; so for instance a production role might include standard and base::firewall, while a labs role might include some different classes.

I spent a lot of time about a year ago killing any ::labs roles - those usually end up drifting over time and accumulating cruft, and becoming copy pasta things that end up causing a lot of trouble. In an ideal world, this doesn't happen - but in ours, it definitely does. The thing that's worked well for this is two fold:

  1. Feature flags - gate very specific, meaningful things. The puppetmaster module is fairly full of them, and this is the method I've been using to get rid of the copy pasta puppet module (and role::puppet::self)
  2. Composing - the ::labs roles for parsoid, etc were initially created to add some CI specific classes to them, and eventually 'drifted'. We fixed these by composing together, on the node, the two roles (role::parsoid or whatever, and role::ci::slaver or whatever).

So I *really* don't want us to have ::labs roles for things that are not core labs infrastructure that are running in the prod realm (such are mariadb, the puppetmaster, etc) - and even those should be role::labs::<something> and not role::something::labs.

So @yuvi first things to make clear:

  • our current roles are mostly profiles in this definition.
  • The one role: one machine rule is basically the same thing you do in labs composing profiles right now. I think for labs it's completely acceptable to compose profiles at will if needed. Beta (deployment-prep) being probably the only exception.
elukey triaged this task as Medium priority.Oct 18 2016, 12:57 PM

Just to confirm my understanding... the proposal is basically to rename everything currently called a role (and the role:: class hierarchy) to profile (and profile::) and then add a new layer of aggregation above 'profile' called 'role'. Am I right so far?

If so, that sounds annoying but fine with me (and probably better than introducing or abusing terminology in a way that diverges from the rest of the world.)

But, to echo @yuvi, I still don't understand why this 'profiles must have no params and only do hiera lookups' idea keeps cropping up. In addition to the bad-for-labs reasons that Yuvi mentions, it just seems like bad software design. Imagine a C framework where every function only took (void) arguments and instead looked up arguments in a big global struct -- you could do that, but why?

But, to echo @yuvi, I still don't understand why this 'profiles must have no params and only do hiera lookups' idea keeps cropping up. In addition to the bad-for-labs reasons that Yuvi mentions, it just seems like bad software design. Imagine a C framework where every function only took (void) arguments and instead looked up arguments in a big global struct -- you could do that, but why?

I think this is basically the same sentiment I expressed on the ops-l mailing list. As I understand it, @Joe still wants to use hiera lookups for parameters to configure classes and defines, but he wants those lookups to be expressed inline in the profile/*.pp class bodies rather than in the explicit definition of the classes themselves. This method of configuration is essentially optimizing for human readability over machine parsability. As such it is going to be more difficult to build systems such as the horizon Puppet configuration interface as there will be less metadata to use to inform the end-user of the possible knobs to turn for a given role (which we will now call a profile).

FWIW, even though some aspects seem painful, I think the general direction of the proposal seems better than where we're at today. I'm especially *not* fond of the following things that we have today, but would be disallowed under the new guidelines:

  1. Class parameters which are set directly from hiera (both in modules and in roles), as opposed to using hiera() calls. These are spooky actions at a distance, and you end up with too many levels of defaulting in too many places. Explicit hiera() calls inside a class seem much cleaner for human understanding, and easier to grep for, etc.
  2. hiera() lookups inside of non-role/profile classes/modules.

To re-state the short and sweet version of the parts of the proposal I like and how they look on these fronts (please correct me if I'm significantly wrong on any of the below, but it's how I'm interpreting things currently):

  1. What we're doing in role:: these days is now "profiles". Fine, it's a syntax alignment with upstream and the rest of the world, but not functionally much different.
  2. The new "roles" that build on profiles serve two functional purposes. First, they serve as the aggregation point for multi-profile hosts. Secondarily and perhaps most-importantly, they serve as the most common and important key for hiera() lookups inside of profiles.
  3. Regular module code (not roles, not profiles) can have class or define parameters, but never looks at hiera directly, via any mechanism.
  4. site.pp only references role classes, and one machine has exactly one role class. Gets rid of lots of edge cases and odd combinations that are hard to refactor around and deal with. Shouldn't have any other data. All stanzas are simply node /blahblah[0-3]/ { role(foo) }.

Putting these things together gets you this kind of pseudo-code, which I think is much more ideal than where we are today:

modules/foo/manifests/foobar.pp:
  define foo::foobar ($param1, $param2) { ...no hiera... }
modules/profile/manifests/baz.pp:
  class profile::baz {
    foo::foobar {
      param1 => hiera('baz_param1', false);
      param2 => hiera('baz_param2', true);
    }
  }
modules/role/manifests/quux.pp:
   class role::quux { include profile::baz; }
modules/role/manifests/quux.pp:
   class role::blah { include profile::baz; include profile::whateverelse; }
manifests/site.pp:
   node /wtf[0-9]/ { role(quux) }
   node /omg[0-9]/ { role(blah) }
hieradata/role/common/quux.yaml:
   baz_param1: true
hieradata/role/common/blah.yaml:
   baz_param2: false

The nits beyond that, to me, are these:

  1. Should we commonly include defaults in the hiera() calls in profiles? I think there's two cases here: it's a good idea to use a default here when it's a true default that usually makes global sense (as in, the default of a boolean is to enable a common feature). For values that will always be role/host -specific (like, say, an IP address parameter), there probably should be no default, so that compilation fails if an applicable role/host lacks the necessary specific hieradata definition.
  2. How does labs work with the above? I imagine we don't use labs-specific roles, but we do have different hieradata for the labs realm as a whole. In other words, the labs realm has a different data source for the hieradata that's set for roles quux and blah, where it can set labs-realm-specific values for those roles. The relevant instances themselves will still be defined in terms of a singular, parameter-less role in the equivalent of site.pp.

Putting aside how labs will work with this, I am trying to understand why explicit hiera lookups with arbitrary, non puppet enforced names are better than having them be parameters to profiles. @BBlack I've taken your example and slightly modified it, and I find this far easier to understand, IMO.

modules/foo/manifests/foobar.pp:
  define foo::foobar ($param1, $param2) { ...no hiera... }
modules/profile/manifests/baz.pp:
  class profile::baz(
   $param1 = false,
   $param2 = true
  ) {
    foo::foobar {
      param1 => $param1,
      param2 => $param2
    }
  }
modules/role/manifests/quux.pp:
   class role::quux { include profile::baz; }
modules/role/manifests/quux.pp:
   class role::blah { include profile::baz; include profile::whateverelse; }
manifests/site.pp:
   node /wtf[0-9]/ { role(quux) }
   node /omg[0-9]/ { role(blah) }
hieradata/role/common/quux.yaml:
  profile::baz::param1: true
hieradata/role/common/blah.yaml:
   profile::baz::param2: false

Why not this?

I understand that modules / profiles / roles is better conceptually than what we have now, but don't understand the explicit hiera calls.

Lack of namespace enforcement is an interesting issue (both in terms of collisions on popular names, and also in terms of organizing the hieradata without creating very large files at points like hieradata/common.yaml and hieradata/eqiad.yaml), but it's hard to say how real it is without working through a number of real examples.

Many of the hieradata settings I can think of really aren't properties of a profile, they are attributes of the role (or the host). You can think of it like the facts collection of a host. The hieradata is basically additional administrator-supplied "facts" at the host and the role level. The fact $ipaddress_eth0 doesn't really belong in the namespace of any profile, it's a fact about a host that may be re-used in the parameters of many profiles. Similarly, nameservers is a fact-like thing that should essentially be a per-datacenter hieradata global (defined in e.g. hieradata/eqiad.yaml), and may be consumed (ultimately for the parameterization of abstract functional modules) by many profiles.

An array of kafka broker hostnames might be similar in nature. It's going to be consumed by several distinct profiles (profile::kafka::broker and profile::cache::text and whoever else needs the list), so it doesn't really belong to a specific profile.

What are good examples of data that naturally belongs in a profile:: namespace?

I think even if there's a category of good ones, there's clearly a large category of them that don't logically belong in the profile:: namespace, and thus wouldn't fit a model where we trade all the explicit heira() calls for profile classparams. We probably don't want to mix the two models, as it's confusing then (we'd have hieradata in the same role files like quux.yaml, some of which is namespaced to be profile classparams, and some of which isn't).

Perhaps labs discovery of relevant hiera variables could be done via good class docs?

/**
 *
 * [*nonya_param1*]
 *   ...

class profile:nonya {
    $param1 = hiera('nonya_param1', true)
...

?

Just to confirm my understanding... the proposal is basically to rename everything currently called a role (and the role:: class hierarchy) to profile (and profile::) and then add a new layer of aggregation above 'profile' called 'role'. Am I right so far?

Some of our roles are a mix of the two things. But broadly speaking you're correct.

If so, that sounds annoying but fine with me (and probably better than introducing or abusing terminology in a way that diverges from the rest of the world.)

But, to echo @yuvi, I still don't understand why this 'profiles must have no params and only do hiera lookups' idea keeps cropping up. In addition to the bad-for-labs reasons that Yuvi mentions, it just seems like bad software design. Imagine a C framework where every function only took (void) arguments and instead looked up arguments in a big global struct -- you could do that, but why?

So the "no params, only hiera lookups" thing, I am not particularly attached to it, although it's both something we said during the plenary on puppet and has its merits, as being explicit instead of implicit.

I would be ok with the following amendments to the rules I just stated:

  • Profiles have parameters, those will be looked up on hiera
  • No autolookup is allowed for hiera variables defined anywhere but profiles.

But this exposes us to much more abuses, IMHO, and also creates the issue @BBlack was referring to:

say you have a role:

role foo {
   include profile::kafka
   include bar
}

and say both profiles need to access that list; in your hypothesis we'll have

class profile::kafka::client ($kafka_servers) {
...
}

class profile::bar ($kafka_servers) {
...
}

this means that you'll have to declare that list twice in hiera, once for profile::kafka::client::kafka_servers and profile::bar::kafka_servers, while with the explicit model you can do

class profile::kafka::client {
  $kafka_servers = hiera('kafka_servers')
...
}

class profile::bar ($kafka_servers) {
  $kafka_servers = hiera('kafka_servers')
...
}

because you're looking up a variable that is not limited to the namespace of that profile.

It will also be easier to grep for it in the code, of course, which is a nice plus IMHO.

But I am ok with dropping this suggestion.

[CUT]

  1. How does labs work with the above? I imagine we don't use labs-specific roles, but we do have different hieradata for the labs realm as a whole. In other words, the labs realm has a different data source for the hieradata that's set for roles quux and blah, where it can set labs-realm-specific values for those roles. The relevant instances themselves will still be defined in terms of a singular, parameter-less role in the equivalent of site.pp.

For labs, I tested today the horizon UI: basically you can use Horizon to declare puppet roles and hiera for sets of hosts based on the prefix of the hostname (basically, what site.pp does, but more ordered). If you tie a role (say, role::etcd::common) to a prefix. then you have basically recreated the mechanism, as long as you respect the prefix rule in naming your hosts. So for example now in deployment-prep any host named deployment-conf.* will have role::etcd::common and the corresponding hiera rules.

I think it works nicely.

What are good examples of data that naturally belongs in a profile:: namespace?

Answering my own question... the obvious case would be module-specific feature flags and settings, I think?

For instance, modules/tlsproxy (whole other debate whether that's an appropriate module, since it's wrapping other modules...) might have a class or define parameter that's a functional feature-flag we might want to only enable for certain profiles' uses, such as $do_ocsp, or $enable_tfo. Enabling TFO or OCSP for tlsproxy is something that probably would be local to a profile, and could logically be e.g. a classparam of profile::cache::text which it uses to parameterize its invocation of some modules/tlsproxy class or define.

What are good examples of data that naturally belongs in a profile:: namespace?

Answering my own question... the obvious case would be module-specific feature flags and settings, I think?

For instance, modules/tlsproxy (whole other debate whether that's an appropriate module, since it's wrapping other modules...) might have a class or define parameter that's a functional feature-flag we might want to only enable for certain profiles' uses, such as $do_ocsp, or $enable_tfo. Enabling TFO or OCSP for tlsproxy is something that probably would be local to a profile, and could logically be e.g. a classparam of profile::cache::text which it uses to parameterize its invocation of some modules/tlsproxy class or define.

I agree, basically:

  • whatever is a feature flag of the profile should be in the profile:: namespace
  • whatever is a general feature of production that can reasonably be referenced by more than one profile (like the lvs config, or the hierarchy of redis cross-dc replica, or the replica tree of puppetmasters, or cache nodes) should NOT be namespaced with profile:: but with a more generic label, like puppetmaster::servers.
  • General feature flags like has_ganglia or has_lvs should be un-namespaced top-scope variables.

I am not suggesting this should be a strict rule in our new standards, but more of an advice.

I don't disagree with the above, but I do still think we run the risk of namespace confusion and clutter issues on the latter two points (the ones not in the profile:: space).

The colons in puppetmaster::servers are really just abitrary syntax unrelated to all class/module namespaces. But seeing any foo::bar anywhere *looks* a lot like classnames, and thus it's confusing if it isn't (especially when it happens to overlap a real classname).

It might be less-confusing (especially during the transition when there will be so many legacy counter-examples of hieradata classparams for non-profile classes) to not give them names that look like they're classy when they're not.

There's really not much functional difference between puppetmaster::servers and puppetmaster_servers anyways, but that latter will never be confused for a classparam.

  • whatever is a general feature of production that can reasonably be referenced by more than one profile (like the lvs config, or the hierarchy of redis cross-dc replica, or the replica tree of puppetmasters, or cache nodes) should NOT be namespaced with profile:: but with a more generic label, like puppetmaster::servers.
  • General feature flags like has_ganglia or has_lvs should be un-namespaced top-scope variables.

Can we recommend/require some distinct prefix (wmf::, global::, shared::) for these "generic" settings that are functionally global variables within hiera? The arguments that are being made for easier grepping break down somewhat when you simultaneously recommend expanding the use of arbitrary global variable names. In current usage this had lead to confusing things like the hardcoding of hiera('mediawiki::redis_servers::eqiad') in the middle of ::role::memcached.

I *think* I'm ok with the proposal as stated with the amendments @Joe
mentioned :)

  • General feature flags like has_ganglia or has_lvs should be un-namespaced top-scope variables.

Can we recommend/require some distinct prefix (wmf::, global::, shared::) for these "generic" settings that are functionally global variables within hiera? The arguments that are being made for easier grepping break down somewhat when you simultaneously recommend expanding the use of arbitrary global variable names. In current usage this had lead to confusing things like the hardcoding of hiera('mediawiki::redis_servers::eqiad') in the middle of ::role::memcached.

Ignoring the other reasons that specific hiera() call might be questionable, how is it not greppable?

bblack@alaxel:~/repos/puppet$ git grep mediawiki::redis_servers::eqiad
hieradata/common/mediawiki/redis_servers.yaml:mediawiki::redis_servers::eqiad:
hieradata/labs/deployment-prep/common.yaml:"mediawiki::redis_servers::eqiad":
manifests/role/memcached.pp:        'eqiad' => hiera('mediawiki::redis_servers::eqiad'),
modules/mediawiki/manifests/nutcracker.pp:            server_map           => hiera('mediawiki::redis_servers::eqiad'),

My issues with the existing name there and any other prefix is that they're all confusing. There is no class named mediawiki::redis_servers, and there is no class named shared or global or wmf, either (and if there were any of those things, it would be even worse). Don't use naming that looks suspiciously like classnames for something that's not at all part of the class namespace.

@_joe_, can you please amend (or provide a new version) of the RFC incorporating the agreements here? I've re-read this thread but am no longer really sure what we are or aren't agreeing on :)

@Andrew will do.

Just to be sure, what's the way you extract your parameters for roles in the horizon UI?

Do you have a pre-baked parser or you had to write your own?

@Joe, I use the puppetmaster API:

https://labcontrol1001.wikimedia.org:8140/puppet/resource_types/role

The /role part is arbitrary, I should be able to gather up profiles and params the same way with /profile

There is one issue I 'd like to (re?)touch on. Whether explicit hiera() lookups in profiles should have defaults or not (I am assuming for some reason they get used). My question is basically a dumbed down version of the "should profile classes should have parameter (with defaults)?".

There is one very big "no no" coming from puppetlabs here: http://garylarizza.com/blog/2014/02/17/puppet-workflow-part-2/. And it's quite well argumented. I 've been in the state of trying to figure out what has happened due to some default too many times already.

But disallowing defaults breaks a lot the simple workflow of "I want to apply this on a Labs VM not care for anything else" that many users (including myself occasionally) have. Which is clearly one of the reasons Labs wants class params in profiles, with sane defaults.

And this pauses the big meta question of who our puppet tree is for.

  • "production" users ?
  • Casual labs users ?
  • Non-casual labs users (probably an superset of the 1st bullet) who are fine with setting a few toggles?
  • All of the above ?
  • All of the above plus the kitchen sink ?
  • None of the above ?

Aside from the 2 last bullets where I try to be humorous, I 'd say the link I 've pasted above almost definitely means the first bullet. But we clearly need something different. Which is probably the "All of the above", but I am not sure how to reach that yet given bullets 1 and 2 have different requirements as a conclusion.

A few things about http://garylarizza.com/blog/2014/02/17/puppet-workflow-part-2/:

  1. That argument is premised on a given user having access to the hiera data without access to the puppet code, or vice versa. That is not our use case.
  1. It doesn't seems to not distinguish between good and bad defaults. Obviously we shouldn't set a default to something that will cause bad behavior. Typically a 'default' value means "Most users will want X." If there's no such thing as a good default then, of course, there should not be a default.
  1. In cases where we don't have good default values, params are better than hiera lookups -- you can look at the class definition and see a list of things that need to be set for the class to compile.

A few things about http://garylarizza.com/blog/2014/02/17/puppet-workflow-part-2/:

  1. That argument is premised on a given user having access to the hiera data without access to the puppet code, or vice versa. That is not our use case.

Which use case of all ? As I said above we got more than 1.

  1. It doesn't seems to not distinguish between good and bad defaults. Obviously we shouldn't set a default to something that will cause bad behavior. Typically a 'default' value means "Most users will want X." If there's no such thing as a good default then, of course, there should not be a default.

I 'll assume we are always talking about "profile" classes here.

I think that article makes the point the whatever the default is ("good or bad"), someone (probably the power user) will end up not getting what they wanted and chasing stuff around the code+hiera tree to figure it out what they need to tweak. Which given it's written for "production/power users" (bullet point 1 from above), makes sense. Iit's worse to confuse a power user than asking them forcibly to set a value in hiera.

  1. In cases where we don't have good default values, params are better than hiera lookups -- you can look at the class definition and see a list of things that need to be set for the class to compile.

Same assumption about "profile" classes as above.

params will cause implicit hiera lookups, which will end up being used and will end up causing confusion/pain. That's the argument that article makes against params for profile classes. It's the "be as explicit as possible" mantra. Which is not without its shortcomings of course.

I really don't know how to engage when you assert that you are unable to understand how implicit lookups work. They're unfamiliar and, as long as we don't use them, will remain unfamiliar. Am I really the only one out here in favor of simply using the language /as it is designed/?

Am I really the only one out here in favor of simply using the language /as it is designed/?

Given its designers themselves declared this (amongst a lot of other things they were able to deprecate during the years) was a very, very bad idea.

To be clear: implicit lookup only at the profile level could be acceptable, but as soon as you start using it a lot, your hiera/code combination will be hard to understand. And that's an accepted reality of puppet coding.

I 've been thinking about this RFC over the weekend so I 'll try and give my POV on the points there in a semi-structured way

For the rest of the points made in this RFC on my part I have:

  1. +1 (obviously)
  2. +1 as is. The avoid implicit hiera lookups will need some work to very clearly communicate to avoid accidental use.
  3. +1 on the "profile" structure itself, still struggling a bit on the "explicit hiera" calls and "no parameters". I am inclined to say "let's leave parameters (and implicit hiera calls as a consequence) fly" and reevaluate later on. There is enough work to be done and enough to be gained by the rest of the changes anyway. There is also the issue I raised with defaults for these. Assuming we do want to have a as streamlined as possible experience for users in labs, I am inclined towards "the let defaults be and choose sane ones". I am not fully sold on this, but I am seeing the advantages for labs so I am willing to let it be.
  4. +1 with the added exception of "try to avoid inheritance however as much as possible". It's been known to cause problems and its usage should be very well justified.
  5. +1, with the extra that as some point in the future I would like us to ditch the role() keyword. For now it obviously needs to stick around.
  6. +1, I am a bit afraid about what the exceptions will be, but I 'll assume good sense.
  7. +1 as is.

A question emerging from https://gerrit.wikimedia.org/r/#/c/320690/:

Should explicit hiera keys that are looked up from a role (aka profile) class use puppet class param like namespacing, or should we avoid that? I remember a discussion at the ops offsite about this, and had thought that we wanted to avoid using puppet like namespacing for hiera variables that were not class params. E.g. role::eventstreams::log_level vs. eventstreams_log_level.

As far as I am concerned, hiera keys should be namespaced according to the class that tries to look them up. That is if class profile::foo::bar has a hiera() call to figure out key then the hiera call should be hiera('profile::foo::bar::key'). There will of course be exceptions to this rule. Hiera keys that are looked up from more than one class. Our ganglia_clusters might be a good example. Or perhaps 2 closely related profile classes. Like profile::foo::bar1 and profile::foo::bar2, both wanting to lookup key. I suggest we try to go for the most common denominator in that case. In the example I just gave, the key would probably be profile::foo::key.

Pros

  • That should make the key easier to grep and more unique. If we don't do that and we keep all keys to the global namespace, I am pretty sure we will end up doing other namespacing tricks like foo_bar_key in order to resolve the various conflicts that will definitely arise in the end.
  • It is exactly the key that an implicit lookup would generate if key was a parameter for profile::foo::bar making it easier to switch back and forth and adopt either paradigm (implicit lookups vs explicit lookups). We 've already changed our mind once on this one, and it seems to be a very discussed topic so let's keep as many options open as possible.

Cons

  • There is one issue I am seeing with this approach, which is basically the profile:: prefix being essentially the same to all lookups. Given that we are saying that lookups should only happen from profile classes and not modules or roles, we could possibly ditch it, but that would break the implicit lookups compatibility I 've noted above.
  • Using keys like that implicitly supports the parameters in profiles and implicit lookups pattern, which I am as I 've said above OK with, but no consensus has yet been reached.

What do other people think ?

Hm, maybe I'm getting confused. Isn't the point of hiera to be able to look up the same variable from different scopes and get different values? If so, why would we choose to use hiera and prefix all variable names for a specific scope? Don't we lose the main value of using hiera in the first place?

The point of hiera is to separate the code from the (configuration) code from the (configuration) data, minimizing code duplication, increasing reusability and thus decreasing complexity (I know that sounds kind of an oxymoron, given hiera actually adds some complexity, it's the end sum that matters).

But what you describe, is possible with prefixed variable names. And not only that, but actually the goal. The role() keyword is the key here. The role (which I remind that, in the proposed scheme, is practically a collection of profiles), instructs hiera where to look up its data.

As an example, imagine the following:

  • 2 boxes having 2 different roles, roleA and roleB. I remind that, in the proposed in this RFC scheme, it's 1 role per box strictly.
  • Both roles include a profile. Say profile::foo::bar
  • The profile looks up (explicitly or implicity, it's unimportant for this example), the hiera key profile::foo::bar:my_var

Now the role() keyword, instructs hiera to use the role hiera backend, which would end up looking up the variable under (this is not a complete list) hieradata/role/eqiad/roleA.yaml, hieradata/role/common/roleA.yaml, ... and hieradata/role/eqiad/roleB.yaml, hieradata/role/common/roleB.yaml, ... respectively, with the key profile::foo::bar:my_var

So the same hiera key gets 2 different lookups depending on the including role class. Which is exactly what you described.

So, given we need to close this discussion once and for all, I came up with a change of wording that would have us write the profile classes as follows:

Example of a profile class header:

profile::redis::server( 
  $instances = hiera('profile::redis::instances'),  # local variable without defaults
  $ipsec_secret = hiera('ipsec_secret'), # global variable, without defaults
  $replication_map = hiera('profile::redis::replication_map', {}), #local variable with defaults
){ 
...

this is more complex than the standard profile definition with no parameters that the general puppet best practices advice to use, but at least doesn't lose the explicitness of the lookups, while allowing easy discovery for the UI in labs.

Let me stress again I think it is extremely unfortunate that we choose to have a worse version of a coding standard (and one that deviates significantly from the general community consensus) to make our code discoverable easily in a GUI. Think if we decided to do something like this wil PEP8 in python. Everyone would be throwing chairs around.

Which makes me think again that labs VMs should have their own puppet repository distinct from the production one (as they have different needs and a completely different configuration and tooling), and add environments support to allow people to develop their own puppet. Or it will be mostly unused.

Anyways, I consider this current wording of the proposal to be mostly uncontroversial, as I included the major (only?) pain point presented here.

If there are no large objections, I will re-write our puppet coding standards page in the upcoming weeks to reflect this proposal.

Which makes me think again that labs VMs should have their own puppet repository distinct from the production one (as they have different needs and a completely different configuration and tooling), and add environments support to allow people to develop their own puppet. Or it will be mostly unused.

<threadjack>
Forking ops/puppet.git is exactly the opposite of the opinions that I have heard on the topic in the past which were most recently brought to my attention by @faidon.

I do however agree that we could do a much better job of allowing an easy and maintainable solution for augmenting the ops/puppet.git base with project specific classes and configuration overrides for Labs users.
</threadjack>

I 'd like to add a hiera key formation clause please.

  1. hiera keys SHOULD try to reflect the most specific common shared namespace of the puppet classes trying to look them up (implicitly or explicitly). This should allow easier grepping and avoid conflicts. Global variables SHOULD BE avoided as the profile will by definition be the least specific common namespace.

Examples:

  1. class profile::foo::bar SHOULD try to look up variables with keys prefixed with profile::foo::bar:: e.g. profile::foo::bar::var1, profile::foo::bar::var2 etc.
  2. classes profile::foo::bar1 and profile::foo::bar2 SHOULD try to lookup SHARED (and only those) variables in profile::foo::. e.g. profile::foo::var1, profile::foo::var2.
  3. classes profile::foo1::bar1 and profile::foo2::bar2 SHOULD try to lookup SHARED (and only those) variables in profile::. e.g. profile::var1, profile::var2.

@akosiaris LGTM! There are a bunch of variables that are going to be global anyways, but I agree with your comment.

Thank you for all the rewrites -- I'm happy with the proposal as it stands today. If you have time to add snippets of sample code to the general principles, we'll be pretty close to having some guidelines that we can copy over to wikitech.

Thank you for all the rewrites -- I'm happy with the proposal as it stands today. If you have time to add snippets of sample code to the general principles, we'll be pretty close to having some guidelines that we can copy over to wikitech.

I plan to take all of this to wikitech next week if no one has any outstanding objections. That version will include code snippets/examples.

Joe claimed this task.

Change 342248 had a related patch set uploaded (by Gehel):
[operations/puppet] elasticsearch - move role::elasticsearch::common to a profile

https://gerrit.wikimedia.org/r/342248

Change 342248 merged by Gehel:
[operations/puppet@production] elasticsearch - move role::elasticsearch::common to a profile

https://gerrit.wikimedia.org/r/342248

I have a question about the new profile guidelines:

Profile classes should only have parameters that default to an explicit hiera calls with no fallback value.

Why no fallback defaults?

Consider my current use case: a new profile::kafka::broker. I presently need this to test deployment and migration of a new Kafka version and cluster for T152015, and I'll need to to this a bunch in labs. If I can't use hiera fallback defaults, that means that every single parameter for the confluent::kafka::broker module class will need to be specified in hiera. Often, many of the parameters will be the same for both labs and production, so it would make sense to put the defaults as usable for a simple Kafka deployment in labs in the top level common.yaml file.

But, this is likely true for many other profiles as well. Are we going to start putting every profile parameter default in common.yaml? common.yaml is gonna get pretty long...

I don't understand the advantage of not using profile parameter hiera fallback defaults. If a value is not defined in hiera anywhere, it will just use the hiera fallback default listed in the profile class parameter. It seems easy to reason about what that value will be in any context.

Also, if configuration of profiles can only be done via hiera, doesn't that mean any module parameter that we may want to override needs to be specified as a profile parameter? confluent::kafka::broker has a lot of parameters that might need tweaked. Should I really specify all that do as a profile parameter that defaults to a hiera lookup with no default?

Also, if configuration of profiles can only be done via hiera, doesn't that mean any module parameter that we may want to override needs to be specified as a profile parameter?

Any parameter that you /need/ to override.

To be very clear, I'd expect you to create a different profile for clusters with radically different configurations.

confluent::kafka::broker has a lot of parameters that might need tweaked. Should I really specify all that do as a profile parameter that defaults to a hiera lookup with no default?

If the profile you're building is ok with the default class parameter, just don't specify the value when declaring the class.

I have a question about the new profile guidelines:

Profile classes should only have parameters that default to an explicit hiera calls with no fallback value.

Why no fallback defaults?

Because we prefer explicit declaration over implicit.

Consider my current use case: a new profile::kafka::broker. I presently need this to test deployment and migration of a new Kafka version and cluster for T152015, and I'll need to to this a bunch in labs. If I can't use hiera fallback defaults, that means that every single parameter for the confluent::kafka::broker module class will need to be specified in hiera.

Nope, you can just declare the parameters you want to override, not every one of them.

Often, many of the parameters will be the same for both labs and production, so it would make sense to put the defaults as usable for a simple Kafka deployment in labs in the top level common.yaml file.

What about you just specify a fixed value if this is the case? Do you really need a parameter for something that has the same value everywhere?

But, this is likely true for many other profiles as well. Are we going to start putting every profile parameter default in common.yaml? common.yaml is gonna get pretty long...

DO NOT do that. It's against the whole idea of what we want to do. There are exceptions, but in general we *do not* want to define global defaults in hiera.

I don't understand the advantage of not using profile parameter hiera fallback defaults. If a value is not defined in hiera anywhere, it will just use the hiera fallback default listed in the profile class parameter. It seems easy to reason about what that value will be in any context.

The advantage is you have to design your profiles carefully and not just blindly declare 500 parameters. Just declare as parameters the things that will really change, and for the rest, just pass the value that's always valid to the module class you're calling.

Finally: this RFC is closed; any further discussion should go to a new ticket or (better) on the puppet coding rules talk page on wikitech.

any further discussion should go to a new ticket or (better) on the puppet coding rules talk page on wikitech.

Ah ok, I went back and forth on those two, and decided here. I'll ask one more question here just for continuity.

I'd expect you to create a different profile for clusters with radically different configurations.

Ok, but what about labs? Does this mean I should create labs versions of the different kafka cluster configurations? I'm not sure yet, but this means I'd have profile::kafka::main::broker, profile::kafka::aggregate::broker (name TBD), profile::kafka::main_labs::broker, profile::kafka::aggregate_labs::broker. I suppose I could put a $::realm conditional in the profile to avoid the extra labs classes, but that kinda sucks too.

Also, in labs, it's nice to be able to test changes to settings by modifying hiera in horizon, rather than making commits to puppet. With these rules, any parameter we might want test changing in labs first needs a puppet commit.

Change 314829 abandoned by Giuseppe Lavagetto:
swift: refactor to role/profile pattern, part 1

Reason:
Was done as an example, not needed anymore.

https://gerrit.wikimedia.org/r/314829