Page MenuHomePhabricator

Upgrade mwlog hosts to Bullseye
Closed, ResolvedPublic

Event Timeline

Change 920698 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] mwlog: keep/reuse /srv filesystem across reimages

https://gerrit.wikimedia.org/r/920698

Change 920698 merged by Herron:

[operations/puppet@production] mwlog: keep/reuse /srv filesystem across reimages

https://gerrit.wikimedia.org/r/920698

Cookbook cookbooks.sre.hosts.reimage was started by herron@cumin1001 for host mwlog2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by herron@cumin1001 for host mwlog2002.codfw.wmnet with OS bullseye executed with errors:

  • mwlog2002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Mwlog2002 is throwing an error and dropping into grub rescue after reimage with the reuse partitions recipe, going to try and troubleshoot the recipe

error: symbol `grub_file_filters' not found.

I live edited reuse-lvm-root-4dev.cfg adding this to the bottom of the file, after another reimage the host boots into the os and is accessible from install_console

d-i     grub-installer/bootdev  string  /dev/sda /dev/sdb

Will follow up with a patch to persist this

Change 924535 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] reuse-lvm-root-4dev: add grub-installer/bootdev config

https://gerrit.wikimedia.org/r/924535

Change 924535 merged by Herron:

[operations/puppet@production] reuse-lvm-root-4dev: add grub-installer/bootdev config

https://gerrit.wikimedia.org/r/924535

END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye

Reimage completed:

  • mwlog2002 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305301449_herron_885577_mwlog2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 924555 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] mwlog: add remove_python2_on_bullseye exemption

https://gerrit.wikimedia.org/r/924555

Change 924555 merged by Herron:

[operations/puppet@production] mwlog: add remove_python2_on_bullseye exemption

https://gerrit.wikimedia.org/r/924555

Change 924557 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] udp2log: dont use python symlink

https://gerrit.wikimedia.org/r/924557

Change 924557 merged by Herron:

[operations/puppet@production] udp2log: dont use python symlink

https://gerrit.wikimedia.org/r/924557

mwlog2002 is up and running now on bullseye. I made a cursory attempt to use python3, but after fixing errors thrown and getting the daemons up and running under python3, it still wasn't writing logs to the filesystem.

I opted to go with python2 and get the host back online sooner rather than later. We'll need to revisit adapting this for python3, or transition to something else.

Change 925813 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] mwlog1002: add python exemption

https://gerrit.wikimedia.org/r/925813

Change 925813 merged by Herron:

[operations/puppet@production] mwlog1002: add python exemption

https://gerrit.wikimedia.org/r/925813

herron claimed this task.

mwlog1002 has been upgraded to bullseye as well, resolving!

Is there a task about the udp2log porting work to Python 3, or will that be unnecessary due to T205856?

Is there a task about the udp2log porting work to Python 3, or will that be unnecessary due to T205856?

Yes I think that's the best way to track it, I'm hesitant to create a task specifically about updating udp2log since ideally we'll be transitioning away from it.