Page MenuHomePhabricator

Where to Put Community Modules?
Closed, ResolvedPublicBUG REPORT

Description

Where to Put Community Modules?

Backstory

Starting with Puppet 6.0 some formally core puppet types have been moved into community maintained modules, namely nagios_core & mailalias_core, both of which we use. I opened a patch to propose adding them to our repository. @jbond & @MoritzMuehlenhoff raised important concerns about how we want to handle community modules.

Question

We have two primary types of puppet modules in our repository, Community developed modules and In House developed modules. Community modules are those that we import into or repository, but are developed by a community outside of the foundation, e.g. stdlib, concat, lvm, etc. In House developed modules are those used solely by the foundation and developed in our puppet repository, e.g. ganeti, zookeeper, etc. At preset we co-mingle both types of modules into our single puppet repository under the ./modules directory. Should we continue to do so?

Options

Vendor modules in our repository under ./modules

This is what we presently do.

Advantages
  • No packaging required, easy to add new modules
  • Easy to patch
  • Easy to find any module
  • Easy git grepping
  • Supports a masterless mode of applying
Disadvantages
  • Hard to distinguish between Community & In House modules
  • Hard to see git history of Community modules
  • Patching a Community module makes it harder to import newer upstream versions later.

Vendor modules in our repository under ./vendor/modules

Similar to what we presently do, but uses a directory to separate the two types.

Advantages
  • Easy to distinguish between Community & In House modules
  • No packaging required, easy to add new modules
  • Easy to patch
  • Easy git grepping
  • Supports a masterless mode of applying
Disadvantages
  • Hard to see git history of Community modules
  • Slightly harder to find where a module lives
  • Patching a Community module makes it harder to import newer upstream versions later.

Use git submodules under ./vendor/modules

Rather than dumping code from Community modules, use a git submodule to add the Community repository inside our repository.

Advantages
  • Easy to see git history of Community modules
  • Easy to distinguish between Community & In House modules
  • No packaging required, easy to add new modules
  • Easy to fork
  • Supports a masterless mode of applying
Disadvantages
  • Slightly harder git grepping, git grep --recurse-submodules 'butter'
  • Slightly harder to find where a module lives

Package Community modules as Debian packages

Along with the puppet agent we would package Community modules as Debian Packages and install them along with the puppet agent on a host.

Advantages
  • Easy to distinguish between Community & In House modules
Disadvantages
  • Hard git grepping
  • Hard to see git history of Community modules
  • Hard to patch, would require a fork, submit pull request, get approval, repackage.
  • Does not support a masterless mode of applying
  • Additional maintenance burden of maintaining Debian packages
  • Hard to find any module

Use a Puppetfile, with R10k

Instead of storing modules in our repository we would have a separate "control repo" which defines all our modules in a Puppetfile.

Advantages
  • Easy to distinguish between Community & In House modules
  • Easy to fork
Disadvantages
  • Would require a significant time investment to migrate all of modules into separate repositories.
  • Hard git grepping
  • Hard to see git history of Community modules
  • Does not support a masterless mode of applying

Option X?

Event Timeline

Feel free to edit the description and add advantages and disadvantages.

To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, which was kind of a bigger deal. So maybe not that one.

To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, which was kind of a bigger deal. So maybe not that one.

thanks, @Dzahn feel free to add an item under disadvantages, with perhaps a little info as to why they were not liked?

@jhathaway thanks for writing this up just a few quick comments.

Before commenting i would say that in my mind we have four types types of modules

  • in house modules
  • third-party modules
  • built in types which are all the types included in the puppet source tree.
  • core types which are types that puppet labs packages with the puppet-agent and labeled puppetlabs-core-$foo on puppet forge

We can ignore the built in types as they will be shipped with the puppet agent code

The puppet core types are modules that use to be part of the puppet source tree but got split out when puppet 6 was released. puppetlabs upstream package theses modules are part of the puppet-agent package, however it seems that Debian will split theses out as separate packages. Even though they are separate repos in puppet labs the fact that puppet labs have decided to bundle them with the puppet agent makes me think we should follow suite and package theses possibly even try to keep parity with the versions that puppetlabs ship in there puppet-agent packages. Further i have never needed to submit a patch to theses core types and would even argue that any such changes should be scrutinised by both us and upstream and we shouldn't allow users to so easily change something that is stable. To keep context currently this set would only include i think cron and mailaises

For other third party modules I'm a bit more on the fence currently stdlib and concat are already packaged so going with packages would a simple route, however as would be expected the versions in Debian are a bit dated so we would need to do some work there. However the current method also requires a bit of work to stay compatible (see here and here).

My personal view on these would be to go for r10kas that also works nicely with environments (something that would be nice to see one day) and also gives us a path to move some of our own stable modules to separate repos so the community can benefit from them.

In general i think that the foundation has always been reluctant to use third-party modules for reasons others can better expand on, but some of the reasons would be:

  • third-party modules tend to include a lot of cruft which we don't need
  • the quality of third-party modules varies massively
  • code runs as root so we need to be very careful when reviewing there introduction.

Going forward i don't think we not should change our stance here, we should remain cautious about introducing third party modules and only introduce ones that are of a very high code level and offer a clear benefit.

Vendor modules in our repository under ./modules

Easy to patch

Its worth pointing out that currently we have a policy that we don't touch vendor modules we only ever copy them completely into out mono repo. Previously people hacking about in thirdparty modules has made it difficult for us to update to the most recent upstream version. As such we should always take the view that we should "fork, submit pull request, get approval, repackage." any changes to third party modules regardless of the method.

Use git submodules under

My preference would be to veto this options straight way. i have used it i previous jobs and found that it is a recipe for disaster with people constantly bumping heads against git, making mistakes wit the version they checkout, make it harder to review as you have to resolve a hash to a version tag, and other issues which escape my mind currently. its also worth mentioning that this is something the foundations also started out with and ultimately rolled back due to similar issues

Use a Puppetfile, with R10k

Would require a significant time investment to migrate all of modules into separate repositories.

We should only manage third part modules using r10k to avoid this example Puppetfile. We also allready have some support for r10k in the puppet repo (tested in cloud)

Hard git grepping

I'm guessing this assumes we move away from a mono repo for in house modules however see above this is not necessary

Does not support a masterless mode of applying

I suspect you mean doesn't work with bolt? however even in this case i think we could find a solution. That said i would point out that our puppet repo is not compatible with masterless and its not something that we officially support.

cc @Joe @akosiaris @Volans who will have some additional history and comment

To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, which was kind of a bigger deal. So maybe not that one.

thanks, @Dzahn feel free to add an item under disadvantages, with perhaps a little info as to why they were not liked?

@jhathaway Everything that John said above and there are a bunch of tickets around them causing problems of one type or another. I just remember it as a general PITA https://phabricator.wikimedia.org/search/query/sAFFjigwUTVs/ So i would be favor of that veto as well.

re: "Easy to distinguish between Community & In House modules" I am wondering whether it really matters to distinguish between them.. AFTER we have made a decision to import a third party module, which we should stay reluctant / careful about. Also here John has already found better words for it above.

Before commenting i would say that in my mind we have four types types of modules

  • in house modules
  • third-party modules
  • built in types which are all the types included in the puppet source tree.
  • core types which are types that puppet labs packages with the puppet-agent and labeled puppetlabs-core-$foo on puppet forge

We can ignore the built in types as they will be shipped with the puppet agent code

The puppet core types are modules that use to be part of the puppet source tree but got split out when puppet 6 was released. puppetlabs upstream package theses modules are part of the puppet-agent package, however it seems that Debian will split theses out as separate packages. Even though they are separate repos in puppet labs the fact that puppet labs have decided to bundle them with the puppet agent makes me think we should follow suite and package theses possibly even try to keep parity with the versions that puppetlabs ship in there puppet-agent packages. Further i have never needed to submit a patch to theses core types and would even argue that any such changes should be scrutinised by both us and upstream and we shouldn't allow users to so easily change something that is stable. To keep context currently this set would only include i think cron and mailaises

According to puppet's docs and my own inspection of Puppet's 6.26 agent package the two modules in the original patch mailalias and nagios_core are not shipped as part of the puppet agent. As a result of that change by Puppet in 6.0 I think those modules should be viewed as Community or 3rd party modules and we should treat them no different than a module like stdlib. If you agree with that opinion then I would like to focus this conversation on those modules in comparison to say the cron module which is still shipped by Puppet as part of their agent package. How the cron module should be packaged in Debian is an interesting question, but in my eyes that is separate from how we utilize Community or 3rd party modules.

According to puppet's docs and my own inspection of Puppet's 6.26 agent package the two modules in the original patch mailalias and nagios_core are not shipped as part of the puppet agent. As a result of that change by Puppet in 6.0 I think those modules should be viewed as Community or 3rd party modules and we should treat them no different than a module like stdlib. If you agree with that opinion then I would like to focus this conversation on those modules

Agreed

in comparison to say the cron module which is still shipped by Puppet as part of their agent package. How the cron module should be packaged in Debian is an interesting question, but in my eyes that is separate from how we utilize Community or 3rd party modules.

I was indeed thinking more about theses modules

in comparison to say the cron module which is still shipped by Puppet as part of their agent package. How the cron module should be packaged in Debian is an interesting question, but in my eyes that is separate from how we utilize Community or 3rd party modules.

I was indeed thinking more about theses modules

I created T302481 for theses resources

Have you also considered git subtree instead of git submodules?

@CDanis I had not, here is my attempt at a comparison between git submodules and subtrees

Git subtrees

With the git-subtree(1) command you merge another repository into yours, optionally squashing its history when you merge. In our case it is a merge plus a git mv because we would want to put the module under ./modules/.

I found the mental model very confusing, example usage:

#!/bin/bash

set -o errexit

# Create repo
mkdir -p puppet-with-subtrees/modules
pushd puppet-with-subtrees >/dev/null
git init
echo 'subtrees!' >README.md
git add README.md
git commit -m readme README.md

# Add stdlib
git subtree add --prefix=modules/stdlib git@github.com:puppetlabs/puppetlabs-stdlib.git v8.0.0
git tag stdlib-v8.0.0 -m 'add stdlib@v8.0.0'

# View stdlib log, https://stackoverflow.com/a/14448022
git checkout stdlib-v8.0.0
PAGER='cat' git show HEAD^2

# Update version
git subtree pull --prefix=modules/stdlib -m 'stdlib v8.1.0' git@github.com:puppetlabs/puppetlabs-stdlib.git v8.1.0
git tag stdlib-v8.1.0 -m 'add stdlib@v8.1.0'

# Checkout old rev
## You have to checkout the old tag, because commits are merged, so all the
## commits between v8.0.0 and v8.1.0 are part the repositories log
git checkout stdlib-v8.0.0
PAGER='cat' git show
git checkout main
popd

# Clone
git clone puppet-with-subtrees subtree-clone-test

Git submodules

With gitsubmodules(7) you store the history of another repository separately in $GIT_DIR/modules/ then git creates a link to the submodule. This means once you enter the submodule’s directory all the git commands work as normal.

There are definitely some nettlesome edge cases when adding or updating modules, but I find the mental model much easier to grok, example usage:

#!/bin/bash

set -o errexit

# Create repo
mkdir -p puppet-with-submodules/modules
pushd puppet-with-submodules >/dev/null
git init

# Add stdlib
git submodule add git@github.com:puppetlabs/puppetlabs-stdlib.git modules/stdlib
git -C modules/stdlib checkout v8.0.0
git add modules/stdlib
git commit -m 'add submodule stdlib@v8.0.0'

# View stdlib log
PAGER='cat' git -C modules/stdlib show
## or
pushd modules/stdlib >/dev/null
PAGER='cat' git show
popd

# Update version
git -C modules/stdlib checkout v8.1.0
git add modules/stdlib
git commit -m 'update stdlib to v8.1.0'

# Checkout old rev
git config submodule.recurse true
git checkout HEAD~1
PAGER='cat' git show
git checkout main
popd

# Clone
git clone --recurse-submodules puppet-with-submodules submodules-clone-test

@jhathaway thanks for writing this up just a few quick comments.

In general i think that the foundation has always been reluctant to use third-party modules for reasons others can better expand on, but some of the reasons would be:

  • third-party modules tend to include a lot of cruft which we don't need
  • the quality of third-party modules varies massively
  • code runs as root so we need to be very careful when reviewing there introduction.

Going forward i don't think we not should change our stance here, we should remain cautious about introducing third party modules and only introduce ones that are of a very high code level and offer a clear benefit.

+100 to that. The reasons above are why we aren't more reliant on 3rd party/Community modules and as a consequence why we haven't really invested much in how to manage them in a better way, despite the existence of tooling to do so (r10k, g10k, etc)

To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, which was kind of a bigger deal. So maybe not that one.

thanks, @Dzahn feel free to add an item under disadvantages, with perhaps a little info as to why they were not liked?

Since I was summoned, here are my 2 cents. The biggest problem with the git submodule approach was a result of us tried to adopt the pattern widely. That meant that we tried to apply the pattern to also in-house built modules (we altered puppet-merge to support that pattern). That is, modules that were created to serve the specific needs of the foundation and were not fit for releasing as an artifcact. Modules with no release process, no versioning, minimal to no testing and no stability guarantees. That made breakages all too often. The big ones were easy to handle but usually it was something slightly more subtle that would break puppet on infrastructure that was not immediately managed by the person coding the submodule. So, while in some cases a simple revert of the submodule bump could fix it, usually that was not the case. The required steps then to do an investigation and fix were many and all manual. Given that experience, we initiated the move away from the submodule pattern and back to the monorepo. With high quality and widely used Community/3rd party modules the experience would probably not be so bad. However, high quality 3rd party modules like e.g. stdlib are not the norm.

So, I would immediately veto having our own modules released and used as git submodules. I would be more receptive of Community/3rd party modules as git submodules, provided they are of high quality (and thus their number kept small). However, even if I try to put advantages/disadvantages of Community modules in a monorepo vs in git submodules on a scale, I am not sure it would tilt to the git submodules side. r10k/g10k seems more interesting to me tbh. It's more declarative and easier to reason about.

On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what technical hurdles we 'd meet. Any ideas?

On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what technical hurdles we 'd meet. Any ideas?

Using /vendor/modules is fairly straight forward, you just need to tell puppet to include an additional module path. It offers the benefit of separating Community and In House modules, which is nice, but its benefit in my experience is small.

Based on the discussion so far my inclination is that we stick with our current method of vendoring Community modules in ./modules. Though not a perfect solution, it seems to have worked well for us and the downsides are small. My personal experience with a similar setup mirrors the foundation's experience. As @akosiaris mentioned git submodules for just Community modules could be an interesting route as well, but given the scars of the last submodules experience I don't think the upsides are worth exploring at this time.

I would frankly either keep third-party modules under /modules or move them to /vendor/modules.

While I do love r10k as an idea and I even considered it as an option for puppet for cloud VPS (but for everything, meaning all of our modules would need to be properly versioned for use by cloud as a third-party) I think it would just be an added layer of pain for production.

Thanks for opening this task, having this discussion in seachable, open medium is very useful!

Based on the discussion so far my inclination is that we stick with our current method of vendoring Community modules in ./modules. Though not a perfect solution, it seems to have worked well for us and the downsides are small. My personal experience with a similar setup mirrors the foundation's experience.

Agreed. Currently we don't have any community modules AFAIK, but when we need to import some for Puppet 6 separating them under vendor/modules seems like a good idea. It causes no real code churn, but easily gives people a good hint that if they are about to edit such a module it should ideally be done upstream as well (I fear that otherwise people might make repo-wide changes and not really realise that they are changing a third party module).

moving third party modules to /vendor/modules also makes it a bit easier to exclude theses modules from CI which is a nice minor benefit

On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what technical hurdles we 'd meet. Any ideas?

Using /vendor/modules is fairly straight forward, you just need to tell puppet to include an additional module path. It offers the benefit of separating Community and In House modules, which is nice, but its benefit in my experience is small.

That matches my expectations. Thanks for the info!

Change 770099 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] Move vendored modules to vendor_modules

https://gerrit.wikimedia.org/r/770099

There seems to be some coalescing around moving vendored modules into their own directory, here is a patch that does just that, feedback very much appreciated, https://gerrit.wikimedia.org/r/770099

Change 770960 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] Move vendored modules to vendor_modules

https://gerrit.wikimedia.org/r/770960

Change 770969 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] rsync: fix rubocop style violations

https://gerrit.wikimedia.org/r/770969

Change 770099 merged by JHathaway:

[operations/puppet@production] Prepare to move vendored modules to vendor_modules

https://gerrit.wikimedia.org/r/770099

Change 770969 merged by JHathaway:

[operations/puppet@production] rsync: fix rubocop style violations

https://gerrit.wikimedia.org/r/770969

Change 770960 merged by JHathaway:

[operations/puppet@production] Move vendored modules to vendor_modules

https://gerrit.wikimedia.org/r/770960

Community modules have now been moved to vendor_modules, thanks everyone for the discussion & feedback.