Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | andrea.denisse | T324725 Observability Bookworm/Bullseye upgrades | |||
Resolved | herron | T333614 Upgrade mwlog hosts to Bullseye |
Event Timeline
Change 920698 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] mwlog: keep/reuse /srv filesystem across reimages
Change 920698 merged by Herron:
[operations/puppet@production] mwlog: keep/reuse /srv filesystem across reimages
Cookbook cookbooks.sre.hosts.reimage was started by herron@cumin1001 for host mwlog2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by herron@cumin1001 for host mwlog2002.codfw.wmnet with OS bullseye executed with errors:
- mwlog2002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Mwlog2002 is throwing an error and dropping into grub rescue after reimage with the reuse partitions recipe, going to try and troubleshoot the recipe
error: symbol `grub_file_filters' not found.
I live edited reuse-lvm-root-4dev.cfg adding this to the bottom of the file, after another reimage the host boots into the os and is accessible from install_console
d-i grub-installer/bootdev string /dev/sda /dev/sdb
Will follow up with a patch to persist this
Change 924535 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] reuse-lvm-root-4dev: add grub-installer/bootdev config
Change 924535 merged by Herron:
[operations/puppet@production] reuse-lvm-root-4dev: add grub-installer/bootdev config
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
Reimage completed:
- mwlog2002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305301449_herron_885577_mwlog2002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change 924555 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] mwlog: add remove_python2_on_bullseye exemption
Change 924555 merged by Herron:
[operations/puppet@production] mwlog: add remove_python2_on_bullseye exemption
Change 924557 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] udp2log: dont use python symlink
Change 924557 merged by Herron:
[operations/puppet@production] udp2log: dont use python symlink
mwlog2002 is up and running now on bullseye. I made a cursory attempt to use python3, but after fixing errors thrown and getting the daemons up and running under python3, it still wasn't writing logs to the filesystem.
I opted to go with python2 and get the host back online sooner rather than later. We'll need to revisit adapting this for python3, or transition to something else.
Change 925813 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] mwlog1002: add python exemption
Change 925813 merged by Herron:
[operations/puppet@production] mwlog1002: add python exemption
Is there a task about the udp2log porting work to Python 3, or will that be unnecessary due to T205856?
Yes I think that's the best way to track it, I'm hesitant to create a task specifically about updating udp2log since ideally we'll be transitioning away from it.