Page MenuHomePhabricator

Ensure Puppet checks types as part of the build
Open, MediumPublic

Description

We have started adding types to our puppet files, and can manually check them using Jenkins (example).

However, when Jenkins runs the main build, type errors are not reported. See for example this test change, where a deliberate typo was introduced, but the build still passed. We should incorporate type checking into this build to catch type errors early and derive the most benefit from the types that we add.

Event Timeline

@razzi the puppet catalog compiler reports the error though:

https://puppet-compiler.wmflabs.org/compiler1002/24823/an-worker1078.eqiad.wmnet/index.html

The build fails (see title) and also if you inspect the change errors it lists specifically what's wrong:

https://puppet-compiler.wmflabs.org/compiler1002/24823/an-worker1078.eqiad.wmnet/change.an-worker1078.eqiad.wmnet.err

Error: Evaluation Error: Resource type not found: BooleanXXXXX (file: /srv/jenkins-workspace/puppet-compiler/24823/change/src/modules/profile/manifests/analytics/cluster/packages/common.pp, line: 10, column: 5) on node an-worker1078.eqiad.wmnet

Ya, PCC is good, but Razzi created this ticket hoping that an obvious incorrect type like this could be caught by the Jenkins tests that run with every patch, rather than having to check manually.

Ya, PCC is good, but Razzi created this ticket hoping that an obvious incorrect type like this could be caught by the Jenkins tests that run with every patch, rather than having to check manually.

ah okok makes sense, from the description I thought it was only for PCC. From the Rakefile, I see this comment:

16 # Based on the contents of the change, this rakefile will define and run$
17 # all or just some of the following tests:$
18 #$
19 # * puppet_lint - runs puppet lint on the changed puppet files$
20 # * typos - checks the changed files against a predefined list of typos defined$
21 #            in ./typos$
22 # * syntax - run syntax checks for puppet files, hiera files, and templates$
23 #            changed in the current changeset$
24 # * rubocop - run rubocop style checks on ruby files changed in this changeset$
25 # * spec - run the spec tests on the modules where files are changed, or whose$
26 #           tests depend on modules that have been modified.$
27 # * tox - run the tox tests if needed.

Is the idea to figure out if/why a puppet type typo/error is not raised when rake runs? (I am trying to understand the goal to help :)

Yes, in my opinion there should be a nonzero exit code of the type checker, and that should propagate to the Jenkins job.

fdans triaged this task as High priority.Oct 5 2020, 4:44 PM
fdans edited projects, added Analytics-Clusters; removed Analytics.

@razzi is this something that can be closed in favor of T166066? If not, let's add proper tags to this task so it is visible to others (for example, puppet-complier, operations, etc..)

Ottomata lowered the priority of this task from High to Medium.Nov 30 2020, 4:49 PM
Ottomata edited projects, added SRE, Puppet, puppet-compiler; removed Analytics-Clusters.

Change 644312 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] P:analytics::cluster::packages::common: demo spec test

https://gerrit.wikimedia.org/r/644312

The main issue here is that both puppet-lint and puppet syntax validate (run by the puppet-lint and syntax rake jobs above) are too light weight to pick up issues like this as they are designed to work on individual files and as BooleanXXXXX could be defined else where in the manifest (e.g type BooleanXXXXX = Boolean) its not obvious to theses tools if that there is an error. I would say the standard way to pick up issues like this is to write a spec test for the class, see the example here, which does pick up the issue. however note that that spec test is dependent on a general refactor of the profile spect test which is in relation chain

Change 644786 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] P:analytics::cluster::packages::common: demostrate failing spec test

https://gerrit.wikimedia.org/r/644786

however note that that spec test is dependent on a general refactor of the profile spect test which is in relation chain

The dependency has been merged i have added you all to the change that adds the spec test let me know if you want it merged (or merge it your self :))

Interesting, thank you! It's good to have that on the profile class.

I think we were hoping this could be done as a more general thing on all classes, but I guess that'd require compiling the entire catalog on every patchset, which isn't great.

Should we close this task, or leave it open in case there is some way to do this more generically in the future?

Interesting, thank you! It's good to have that on the profile class.

unfortunately i had to revert the other change in the chain :(

I think we were hoping this could be done as a more general thing on all classes, but I guess that'd require compiling the entire catalog on every patchset, which isn't great.

Ultimately the issues is how long are we prepared to wait for CI. The reason i had to revert the previous change is because it bumped a CI run for anything in the profile module to ~3mins. This is beacause it essentially compiles the entire repo multiple times with different configs. I think a more general case would be possible it would ultimately mean running something like the following

require_relative '../../../../rake_modules/spec_helper'
profiles.each do |profile|
  describe profile do
    on_supported_os(WMFConfig.test_on).each do |os, facts|
       context "on #{os}" do
        let(:facts) { facts }
        it { is_expected.to compile }
      end
    end
  end
end

However this depends heavily on the PS i had to revert as that takes care of dependencies configuring, hiera custom functions, the private repo and other things required to correctly compile the catalogue

Should we close this task, or leave it open in case there is some way to do this more generically in the future?

Happy to leave it open, i think the bigger issue is how long it will take for a general solution to run and if that is acceptable to the workflow of contributors?

I vote to close this in favor of T166066 as @elukey suggested. Writing spec tests for modules doesn't scale as a solution, and we don't want to slow down CI by running more puppet compilation than we need to.

However this depends heavily on the PS i had to revert as that takes care of dependencies configuring, hiera custom functions, the private repo and other things required to correctly compile the catalogue

Also worth noting that i did merge a similar change to this

Writing spec tests for modules doesn't scale as a solution,

Im not sure i agree with this, i think like most development its good practice to provide unit tests for your code and it is common practice with puppet modules in general (i.e. outside of the foundation)

we don't want to slow down CI by running more puppet compilation

This i do agree with however i think its worth noting that adding PCC as an automatic process in CI would increase CI time more then additional spec test

edit: no objections to closing this