With alpha 2 released (https://lists.debian.org/debian-devel-announce/2023/02/msg00005.html), it's a good time to start preparing our autoinstall environment for Bookworm.
Description
Details
Related Objects
Event Timeline
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- The reimage failed, see the cookbook logs for the details
Change 908789 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Use signed-by notation for component/puppet5
Change 908789 merged by Muehlenhoff:
[operations/puppet@production] Use signed-by notation for component/puppet5
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- The reimage failed, see the cookbook logs for the details
Change 908799 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Pass -y --force-yes to puppet installation on bookworm
Change 908799 merged by Muehlenhoff:
[operations/puppet@production] Pass -y --force-yes to puppet installation on bookworm
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- The reimage failed, see the cookbook logs for the details
Change 908833 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Install ruby-sorted-set on Bookworm
Change 908833 merged by Muehlenhoff:
[operations/puppet@production] Install ruby-sorted-set on Bookworm
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304170713_jmm_720569_sretest1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
late-setup.sh has been modified to integrate the Puppet 5.5 forward port during the Debian installer and that makes the installation work!
There's still the caveat that the reimage cookbook doesn't fully complete until the very end since the Puppet run of the rebooted system fails due to the YAML issue mentioned in https://phabricator.wikimedia.org/T330495#8771692, but the installation itself (and the host cert renewal) are working fine now.
We can either leave it or add one of two patches on top of the foward port.
- Use YAML.unsafe_load_file, which shouldn't be any worse than our present yaml safety
- Backport Puppet's safe_load which selectively unmarshal's classes
Both patches are attached and were tested on puppetdb1003. I think the unsafe load is probably fine for our interim state?
We can either leave it or add one of two patches on top of the foward port.
- Use YAML.unsafe_load_file, which shouldn't be any worse than our present yaml safety
- Backport Puppet's safe_load which selectively unmarshal's classes
Both patches are attached and were tested on puppetdb1003. I think the unsafe load is probably fine for our interim state?
Excellent, thanks! I'll add the unsafe.patch version on top of the current component. As you said it's not a difference to the status quo and with the migration to Puppet 7 we'll have it move to the new unmarshalling anyway.
Mentioned in SAL (#wikimedia-operations) [2023-04-18T12:39:48Z] <moritzm> imported puppet 5.5.22-2+deb12u2 for bookworm-wikimedia T330495
Change 909663 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Update puppet version to be installed on bookworm
Change 909663 merged by Muehlenhoff:
[operations/puppet@production] Update puppet version to be installed on bookworm
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304190822_jmm_2721130_sretest1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Change 910438 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/cookbooks@master] sre.hosts.reimage/sre.ganeti.reimage: Delete Puppet state file before reimage
Change 910438 merged by Muehlenhoff:
[operations/cookbooks@master] sre.hosts.reimage/sre.ganeti.reimage: Delete Puppet state file before reimage
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Change 912232 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Temporarily stop using udebs from unstable
Change 912232 merged by Muehlenhoff:
[operations/puppet@production] Temporarily stop using udebs from unstable
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304260822_jmm_1328859_sretest1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Change 912311 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/cookbooks@master] Revert "sre.hosts.reimage/sre.ganeti.reimage: Delete Puppet state file before reimage"
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304261252_jmm_1515402_sretest1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Mentioned in SAL (#wikimedia-operations) [2023-04-27T07:23:38Z] <moritzm> uploaded debmonitor-client 0.3.2-1+deb12u1 to bookworm-wikimedia T330495
Change 912773 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Adapt sources.list for bookworm
On a fresh bookworm installation I'm seeing a few Puppet failures like those:
Error: Could not set 'link' on ensure: wrong number of arguments (given 3, expected 2)
After some digging I tend to believe this is caused by https://www.ruby-lang.org/en/news/2019/12/12/separation-of-positional-and-keyword-arguments-in-ruby-3-0/ and backporting the following patch should fix it?
https://github.com/puppetlabs/puppet/commit/6af09225b3b962547a
@jbond, @jhathaway Does that make sense to you? If so, I'd update out Puppet 5.5. agent backport for bookworm with that patch.
Mentioned in SAL (#wikimedia-operations) [2023-04-27T09:06:15Z] <moritzm> uploaded debdeploy 0.0.99.13+deb12u1 to bookworm-wikimedia T330495
And one more question for @jbond and @jhathaway : We're installing the ruby-safe-yaml package via the monitoring profile, where it seems to be used in check_puppetrun.rb only. Can we fix this script to use different YAML parsing without ruby-safe-yaml? The reason I'm asking is that in bookworm ruby-safe-yaml no longer exists: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019665 and it is lined up for removal from the archive.
Yes that looks correct to me
Can we fix this script to use different YAML parsing without ruby-safe-yaml?
Yes we should be able to use yaml.safe_load ill take a look today
Mentioned in SAL (#wikimedia-operations) [2023-04-27T09:29:02Z] <moritzm> imported wmf-certificates to bookworm-wikimedia T330495
Mentioned in SAL (#wikimedia-operations) [2023-04-27T09:29:57Z] <moritzm> imported prometheus-rsyslog-exporter to bookworm-wikimedia T330495
Change 912798 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] check_puppetrun: Drop safe_yaml
Change 912773 merged by Muehlenhoff:
[operations/puppet@production] Adapt sources.list for bookworm
I can confirm that this works. With the following patch applied setting symlinks works again on Bookworm with Puppet 5.5 and Ruby 3.1. I'm going to roll this into an updated puppet package and upload to apt,wikimedia.org:
Partial Backport of the following upstream commit: From 6af09225b3b962547a2564a9d34ccd6832e60558 Mon Sep 17 00:00:00 2001 From: Melissa Stone <melissa@puppet.com> Date: Wed, 27 May 2020 12:24:40 -0700 Subject: [PATCH] (PUP-10537) Ruby 2.7 warning: keyword param as last arg `warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call` In preparation for Ruby 3, Ruby 2.7 has added warnings around how keyword and positional arguments are used. Details can be found at: https://www.ruby-lang.org/en/news/2019/12/12/separation-of-positional-and-keyword-arguments-in-ruby-3-0/ --- lib/puppet/file_system/file_impl.rb | 2 +- --- puppet-5.5.22.orig/lib/puppet/file_system/file_impl.rb +++ puppet-5.5.22/lib/puppet/file_system/file_impl.rb @@ -121,7 +121,7 @@ class Puppet::FileSystem::FileImpl end def symlink(path, dest, options = {}) - FileUtils.symlink(path, dest, options) + FileUtils.symlink(path, dest, **options) end def symlink?(path)
Mentioned in SAL (#wikimedia-operations) [2023-04-27T12:12:15Z] <moritzm> imported puppet 5.5.22-2+deb13u3 to bookworm-wikimedia T330495
Change 912840 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Install 5.5.22-2+deb12u3 in late-setup.sh
Change 912840 merged by Muehlenhoff:
[operations/puppet@production] Install 5.5.22-2+deb12u3 in late-setup.sh
Change 912798 merged by Jbond:
[operations/puppet@production] check_puppetrun: Drop safe_yaml
Change 912864 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Stop installing ruby-safe-yaml
Change 912864 merged by Muehlenhoff:
[operations/puppet@production] Stop installing ruby-safe-yaml
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors:
- sretest1002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304280730_jmm_3276541_sretest1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Change 913121 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Use signed-by to include the Wikimedia repo starting with Bookworm
Change 913132 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Use signed-by to in apt::package_from_component on Bookworm
Change 913132 merged by Muehlenhoff:
[operations/puppet@production] Use signed-by to in apt::package_from_component on Bookworm
Change 913121 merged by Muehlenhoff:
[operations/puppet@production] Use signed-by to include the Wikimedia repo starting with Bookworm
Change 912311 merged by Muehlenhoff:
[operations/cookbooks@master] Revert "sre.hosts.reimage/sre.ganeti.reimage: Delete Puppet state file before reimage"
Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1002.eqiad.wmnet with OS bookworm completed:
- sretest1002 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305040551_jmm_2214008_sretest1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
Change 916434 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Move duplicity check for apt keyrings to !defined
Change 916434 merged by Muehlenhoff:
[operations/puppet@production] Move duplicity check for apt keyrings to !defined
Mentioned in SAL (#wikimedia-operations) [2023-05-08T07:19:11Z] <moritzm> updated bookworm installer to RC2 T330495
Mentioned in SAL (#wikimedia-operations) [2023-05-09T13:27:06Z] <moritzm> updated bookworm d-i image to 2022-05-09 daily build T330495
The installer is working fine for baremetal and VM installations, but there will be a few more RC releases before the final release, so keeping the task open for now.
One important note is that the memory requirements for the kernel have raised and installations with just 1G RAM will fail, I didn't track down the exact lower minimum, but 1.5G are working perfectly fine.
Mentioned in SAL (#wikimedia-operations) [2023-05-16T11:00:03Z] <moritzm> updated bookworm image to RC3 T330495
Change 921174 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Drop Boost packages from legacy package removal list for Bookworm
Change 921174 merged by Muehlenhoff:
[operations/puppet@production] Drop Boost packages from legacy package removal list for Bookworm
Mentioned in SAL (#wikimedia-operations) [2023-05-30T07:16:33Z] <moritzm> update bookworm installer to rc4 T330495
Change 925862 had a related patch set uploaded (by JHathaway; author: JHathaway):
[operations/puppet@production] expand_path, regex_data: use yaml safe_load when available
Change 925878 had a related patch set uploaded (by JHathaway; author: JHathaway):
[operations/puppet@production] bookworm: Change to deb822 format for sources.list