Page MenuHomePhabricator

Update remaining Ganeti servers in codfw to Bookworm
Closed, ResolvedPublic

Description

Drain, reimage and re-add to the cluster:

  • ganeti2019
  • ganeti2020
  • ganeti2021
  • ganeti2022
  • ganeti2023
  • ganeti2024
  • ganeti2025
  • ganeti2026
  • ganeti2027
  • ganeti2028
  • ganeti2029
  • ganeti2030
  • ganeti2031
  • ganeti2032

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2024.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2024.codfw.wmnet with OS bookworm completed:

  • ganeti2024 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501210937_jmm_1163840_ganeti2024.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2019.codfw.wmnet of running VMs

VM aux-k8s-etcd2004.codfw.wmnet switching disk type to drbd

Draining ganeti2019.codfw.wmnet of running VMs

VM aux-k8s-etcd2004.codfw.wmnet switching disk type to plain

Draining ganeti2019.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=05c11855-71d5-489c-8ed8-13baa1a2b7b9) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2019.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2019.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2019.codfw.wmnet with OS bookworm completed:

  • ganeti2019 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501211623_jmm_1228926_ganeti2019.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2021.codfw.wmnet of running VMs

Draining ganeti2021.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=00d48b0c-86e6-471d-a6ad-c116ef597e9d) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2021.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2021.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2021.codfw.wmnet with OS bookworm completed:

  • ganeti2021 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501221341_jmm_1436192_ganeti2021.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2032.codfw.wmnet of running VMs

VM aux-k8s-etcd2004.codfw.wmnet switching disk type to drbd

Draining ganeti2032.codfw.wmnet of running VMs

VM aux-k8s-etcd2004.codfw.wmnet switching disk type to plain

Draining ganeti2032.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=93df70a9-c65f-4aaf-8a3d-5ab698636ed0) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2032.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2032.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2032.codfw.wmnet with OS bookworm completed:

  • ganeti2032 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501231036_jmm_1649213_ganeti2032.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2022.codfw.wmnet of running VMs

Draining ganeti2022.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=46a6b03e-0964-494b-92f3-40af6ca3beb9) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2022.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2022.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2022.codfw.wmnet with OS bookworm completed:

  • ganeti2022 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501240846_jmm_2347890_ganeti2022.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2020.codfw.wmnet of running VMs

VM ml-etcd2001.codfw.wmnet switching disk type to drbd

Draining ganeti2020.codfw.wmnet of running VMs

VM ml-etcd2001.codfw.wmnet switching disk type to plain

Draining ganeti2020.codfw.wmnet of running VMs

Draining ganeti2025.codfw.wmnet of running VMs

VM kubestagemaster2003.codfw.wmnet switching disk type to drbd

Draining ganeti2025.codfw.wmnet of running VMs

VM kubestagemaster2003.codfw.wmnet switching disk type to plain

Draining ganeti2025.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=4302b551-98b7-475e-9fb4-959f5c56a6cc) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2025.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2025.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2025.codfw.wmnet with OS bookworm completed:

  • ganeti2025 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501271410_jmm_3359971_ganeti2025.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Icinga downtime and Alertmanager silence (ID=e9f62dcb-2ecf-4d32-84ca-34c181e86093) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2020.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2020.codfw.wmnet with OS bookworm

Draining ganeti2026.codfw.wmnet of running VMs

Draining ganeti2026.codfw.wmnet of running VMs

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2020.codfw.wmnet with OS bookworm completed:

  • ganeti2020 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501280824_jmm_3541209_ganeti2020.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Icinga downtime and Alertmanager silence (ID=bc2c7bb0-3133-43fd-9040-c01d53f22d8f) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2026.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2026.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2026.codfw.wmnet with OS bookworm executed with errors:

  • ganeti2026 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ganeti2026.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2026.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2026.codfw.wmnet with OS bookworm completed:

  • ganeti2026 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501281327_jmm_3595550_ganeti2026.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2028.codfw.wmnet of running VMs

VM aux-k8s-etcd2003.codfw.wmnet switching disk type to drbd

Draining ganeti2028.codfw.wmnet of running VMs

VM aux-k8s-etcd2003.codfw.wmnet switching disk type to plain

Draining ganeti2028.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=160bb060-4ed1-4784-9312-c60a5421c725) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2028.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bookworm executed with errors:

  • ganeti2028 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ganeti2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bookworm

Draining ganeti2031.codfw.wmnet of running VMs

Draining ganeti2031.codfw.wmnet of running VMs

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bookworm completed:

  • ganeti2028 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501290816_jmm_3780450_ganeti2028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Icinga downtime and Alertmanager silence (ID=7af53928-134c-4589-9808-e36a2bde4422) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2031.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2031.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2031.codfw.wmnet with OS bookworm executed with errors:

  • ganeti2031 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ganeti2031.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2031.codfw.wmnet with OS bookworm

Draining ganeti2029.codfw.wmnet of running VMs

Draining ganeti2029.codfw.wmnet of running VMs

Draining ganeti2030.codfw.wmnet of running VMs

Draining ganeti2030.codfw.wmnet of running VMs

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2031.codfw.wmnet with OS bookworm completed:

  • ganeti2031 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501291413_jmm_3846181_ganeti2031.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Icinga downtime and Alertmanager silence (ID=83262e5b-e9b2-4d97-bd96-7e9d851edd21) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2030.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2030.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2030.codfw.wmnet with OS bookworm executed with errors:

  • ganeti2030 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ganeti2030.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2030.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2030.codfw.wmnet with OS bookworm completed:

  • ganeti2030 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501300921_jmm_4049035_ganeti2030.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Draining ganeti2029.codfw.wmnet of running VMs

Draining ganeti2029.codfw.wmnet of running VMs

Icinga downtime and Alertmanager silence (ID=c86b38d9-e3a1-4cba-abc9-083df51a2d3e) set by jmm@cumin2002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: remove from cluster for reimage

ganeti2029.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2029.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2029.codfw.wmnet with OS bookworm completed:

  • ganeti2029 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501310756_jmm_163303_ganeti2029.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff updated the task description. (Show Details)

All Ganeti nodes in codfw have been upgraded to Bookworm (and also migrated to nftables alongside).

Mentioned in SAL (#wikimedia-operations) [2025-01-31T08:52:48Z] <moritzm> rebalance codfw/A following OS updates T382508

Mentioned in SAL (#wikimedia-operations) [2025-01-31T12:16:01Z] <moritzm> rebalance codfw/D following OS updates T382508

Mentioned in SAL (#wikimedia-operations) [2025-02-06T08:24:39Z] <moritzm> rebalance codfw/B following OS updates T382508