Page MenuHomePhabricator

Investigate failure of reimage cookbook on Puppet 5 instance (alert2001)
Closed, ResolvedPublicBUG REPORT

Description

While trying to reimage a Debian Buster instance with Puppet 5 to Debian Bookworm with Puppet 5 the reimage cookbook on the Puppet 5 instance (alert2001) failed with the following error:

Exception raised while initializing the Cookbook sre.hosts.reimage:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 199, in run
    runner = self.instance.get_runner(args)
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 101, in get_runner
    return ReimageRunner(args, self.spicerack)
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 181, in __init__
    self.new_puppet_server = self._get_puppet_server()
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 227, in _get_puppet_server
    has_puppet7 = self.puppet_server.hiera_lookup(self.fqdn, "profile::puppet::agent::force_puppet7")
  File "/usr/lib/python3/dist-packages/spicerack/puppet.py", line 591, in hiera_lookup
    result = self.server_host.run_sync(command, is_safe=True, print_output=False, print_progress_bars=False)
  File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 514, in run_sync
    return self._execute(
  File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 720, in _execute
    raise RemoteExecutionError(ret, "Cumin execution failed", worker.get_results())
spicerack.remote.RemoteExecutionError: Cumin execution failed (exit_code=2)

For additional context, we plan on doing the Debian version upgrade and then proceed with the Puppet 5 to Puppet 7 upgrade as suggested by the Foundations team.

Event Timeline

I kindly request Moritz and Riccardo to review this issue and provide any insights or suggestions they might have regarding the failure of the reimage cookbook. Your expertise is greatly appreciated. :)

So there is definitely a bug here in the way the Puppet version is selected: Looking at the Spicerack logs on cumin2002 the failure ultimately comes from the fact that the "puppet lookup" command is run on puppetserver1001, i.e. a Puppet 7 server.

Looking at the logs on cumin2002 you used "--os bookworm -t T333615 alert2001 --new -p5". But the server is already up and running and can be found in Puppetdb, as such the --new isn't needed, can you try running without it?

The fact that the lookup is run on puppetserver is by design, it's just to detect if the profile::puppet::agent::force_puppet7 hiera is set or not.
In this case that's failing with:

$ sudo puppet lookup --render-as s --compile --node alert2001.wikimedia.org profile::puppet::agent::force_puppet7
Warning: Undefined variable '::_role';
   (file & line not available)
Error: Could not run: Evaluation Error: Error while evaluating a Function Call, Failed to execute generator /usr/local/bin/naggen2: Execution of '/usr/local/bin/naggen2 --type hosts' returned 30:  (file: /srv/puppet_code/environments/production/modules/icinga/manifests/naggen.pp, line: 12, column: 18)

The cookbook was run multiple times with different options, hence that's not the culprit here:

2024-02-20 15:32:29,692 denisse 1972086 [DEBUG _cookbook.py:511 in main] Executing cookbook sre.hosts.reimage with args: ['--os', 'bookworm', '-t', 'T333615', 'alert2001']
2024-02-20 15:32:57,943 denisse 1972376 [DEBUG _cookbook.py:511 in main] Executing cookbook sre.hosts.reimage with args: ['--os', 'bookworm', '-t', 'T333615', 'alert2001*']
2024-02-20 15:33:00,812 denisse 1972413 [DEBUG _cookbook.py:511 in main] Executing cookbook sre.hosts.reimage with args: ['--os', 'bookworm', '-t', 'T333615', 'alert2001']
2024-02-20 15:36:44,308 denisse 1974748 [DEBUG _cookbook.py:511 in main] Executing cookbook sre.hosts.reimage with args: ['--os', 'bookworm', '-t', 'T333615', 'alert2001', '-p5']
2024-02-20 15:37:05,105 denisse 1974997 [DEBUG _cookbook.py:511 in main] Executing cookbook sre.hosts.reimage with args: ['--os', 'bookworm', '-t', 'T333615', 'alert2001', '--new', '-p5']

The quickest fix (also required anyway to migrate to puppet7) would be to fix the catalog compilation in puppet7. If that's not possible for some reason than we can find a different solution on the cookbook side if we think this might affect other hosts or, if this is a unicorn host, fix it with a quick hack.
If you go the quick hack way I think that:

  • disabling puppet on alert2001
  • removing alert2001 from puppetdb
  • starting the reimage with --new

should be enough to bypass the issue (not tested, is the first time this happened).

andrea.denisse claimed this task.

Thanks for taking a look at the issue. We were able to successfully proceed with the reimage using the fix suggested. :)

Thanks for taking a look at the issue. We were able to successfully proceed with the reimage using the fix suggested. :)

Fixing the catalogue to be compatible with Puppet 7 will still be needed after the update, can you please split this out into a separate task?

Thanks for taking a look at the issue. We were able to successfully proceed with the reimage using the fix suggested. :)

Fixing the catalogue to be compatible with Puppet 7 will still be needed after the update, can you please split this out into a separate task?

Yes, I've created T358506 to track the catalogue update, thank you.