(suggestions for a better title most welcome)
Currently the operations puppet repository is used by both cloud services and the production network. However, the use of puppet has some subtitle differences. The main differences are the ENC and hiera structure. Over the years this has caused a number of issues where for instance a member of the SRE team (most often the OP) introduces a change which is tested with production in mind, is green in CI and PCC. however when the change is deployed and merged it causes issues with cloud services, often due to missing hiera defaults but sometimes triggered by some of the other subtle differences.
Other then day to day issues, the use of a shared repo also means that the velocity of change is pinned to the slowest moving part. Cloud services has to deal with a lot more nuanced installations with many varying factors. This means that certain things such as dropping support for an old version of puppet is often more difficult and slower to achieve as it requires more coordination and communication with a wider audience.
Over the years, there have been discussions on how we could ease this pain with many solutions proposed and all though some small changes have been achieved the underlining issues still persists. This task is an effort to re-invigorate those discussions in an effort to try and resolve this issues once and for all. At the very least i think it would be useful to link all of the various efforts that have been proposed over time and document the difference and nuance between the cloud services and production puppet environments
Divergent
Hiera structure
One of the main difference between production and the Cloud environment is the structure used by hiera. This has been documented in T255787 but will include here as
Production
- make use of the wmflib::expand_paths for common and site expansion
- make use of a $_role variable created with the role function (see below)
Cloud
- makes use of cloudlib::httpyaml to fetch data from openstack.
- has some additional parts to the hierarcy e.g.
- "cloud/%{::wmcs_deployment}/%{::labsproject}/hosts/%{::hostname}.yaml"
- "cloud/%{::wmcs_deployment}/%{::labsproject}/common.yaml"
- "cloud/%{::wmcs_deployment}.yaml"
- "cloud.yaml"
- can also use a secret repo out side of git (on the puppetmaster FS)
- /etc/puppet/secret/hieradata/%{::labsproject}.yaml
- use a different hierarchy in the private repo
- "labs/%{::labsproject}/common.yaml"
- "%{::labsproject}.yaml"
- "labs.yaml"
Node classifier
The node classifier is essentially used to provide a list of classes that should be applied to a node as well as some additional parameters (aka hiera keys)
Production
In the production environment we use the site.pp manifest along with a custom role function. When called with e.g. role(foo::bar) it does two things:
- load the the class role::foo::bar
- inject a global variable (node parameter) $_role = foo/bar into the manifest. The main use case for this is to look up role specific parameters in hiera (as noted above)
Recently there has been some effort to add the role variable to the cloud node classifier however its currently stalled (see comments on change).
Cloud
The cloud environment uses a custom script which queries the openstack api to produce a list of classes and additional hiera keys to apply to a node. This functionality enables community members to easily test out classes from the puppet repo, swapping hiera values and pairing different profiles classes without the need to make a commit to the puppet repository.
PuppetDB
The cloud environment doesn't have a puppetdb installation (although some individual projects may). This means that any use of either exported resources or functions which rely on puppetdb e.g. puppetdb_query wont work on an arbitrary cloud instance. The lack of puppetdb also means that the cumin puppetdb backend does not function in the cloud environment, however i think that issue is out of scope of this task.
Global Variables
the puppet repo configures a number of global variables via the realm.pp manifest. some of theses variables are the same in both environments some are different and some only exist in the cloud environment. The two variables which differ are
- $realm this points to either 'labs' or 'production' dependent on which DNS domain a node is in.
- $nameservers In production this points to the production anycast service . The cloud environment set this to the cloud service dns servers which has logic to auto populate entries for nodes created in openstack
The $realm variable is also used extensively in the puppet policy
Only in cloud
Most of theses variables are used to provide additional lookup paths in hiera (see above)
- $labsproject this points to the openstack/horizon project of the node
- $wmcs_deployment This indicates the openstack deployment which today is either the cloud production (eqiad1) or development environment (codfw1dev)
- $projectgroup this is equal to "project-${labsproject}" (not sure of the use case hoping cloud services can clarify
In order to produce the variables above some temporary variables where also used, however as they are defined in realm.pp they will also be injected into node scope as such we list them here to be explicit
- $pieces this equals $_trusted_certname.split('[.]')
- $dnsconfig used to populate the $nameservers variable
Possible ways forward
- It should be possible to completely drop the realm variable and relay instead on hiera to control different logic paths. this will likely require a lot of refactoring however it should reduce the number of code paths which differ moving the majority of the problem to hiera
- Add puppetdb to cloud services. I suspect this has been investigated many times and likely difficult to support for instance with there own puppet masters
- (@bd808) puppetdb is not multi-tenant safe/aware which is the blocker to attaching a puppetdb instance to the shared puppetmaster used by the majority of Cloud VPS projects
- Explore the possibility of adding wmflib::expand_path to the cloud services hiera. this could be an additional level with the lowest hiera priority. this one of the areas that cause the most day to day pain and feels like it could be a quick win
- inject role and use the role variable in cloud services hiera structure 680266
- separate puppet repos (in some form). This is something that has been discussed a few times with many different proposals, i think it would be useful to try and resurrect some of those discussions/ideas