Page MenuHomePhabricator

Migrate wikikube-codfw to containerd
Closed, ResolvedPublic

Description

This is to track the migration of all control planes and nodes of the wikikube-codfw cluster to containerd:

https://wikitech.wikimedia.org/wiki/Kubernetes/Administration/containerd_migration

Please take the chance to renumber (--move-vlan) where possible/required:
https://wikitech.wikimedia.org/wiki/Vlan_migration
To query nodes that need --move-vlan:

sudo cumin 'A:codfw and (A:wikikube-master or A:wikikube-worker) and P{F:fqdn ~ ".wmnet$"} and not A:vms and not P{F:netmask = "255.255.255.0"}'

Please take the chance to rename where possible/required:
T365571: Rename wikikube worker nodes during OS reimage

Please take the chance to deactivate hardware raid controller configs:
T358489: mw2420-mw2451 do have unnecessary raid controllers (configured)

Upcoming refreshes and expansions that will/should use bookworm and containerd right away:
T376966: wikikube-worker21[56-70] implementation tracking
T377008: wikikube-worker21[28-35] implementation tracking

Cumin query for nodes that still need reimage:
sudo cumin 'A:wikikube-worker-codfw and not P{F:lsbdistcodename = bookworm}'

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+1 -3
operations/puppetproduction+7 -17
operations/puppetproduction+10 -13
operations/puppetproduction+10 -13
operations/puppetproduction+11 -11
operations/puppetproduction+11 -14
operations/puppetproduction+11 -18
operations/puppetproduction+11 -11
operations/puppetproduction+10 -13
operations/puppetproduction+11 -11
operations/puppetproduction+5 -13
operations/puppetproduction+8 -8
operations/puppetproduction+14 -10
operations/puppetproduction+10 -10
operations/puppetproduction+8 -8
operations/puppetproduction+9 -9
operations/puppetproduction+9 -9
operations/puppetproduction+10 -10
operations/puppetproduction+10 -10
operations/puppetproduction+10 -16
operations/puppetproduction+9 -9
operations/puppetproduction+11 -11
operations/puppetproduction+6 -9
operations/puppetproduction+2 -2
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2228.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2228 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501160845_jelto_3450002_wikikube-worker2228.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2229.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2229 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501160853_jelto_3451218_wikikube-worker2229.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2230.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2230 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501160903_jelto_3453572_wikikube-worker2230.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2231.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2231 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501160906_jelto_3454190_wikikube-worker2231.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2025-01-16T09:26:49Z] <jelto> homer 'lsw1-c6-codfw*' commit 'T377877'

pool host wikikube-worker[2228-2231].codfw.wmnet by jelto@cumin1002 with reason: None

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 pool for host wikikube-worker[2228-2231].codfw.wmnet completed:

  • wikikube-worker[2228-2231].codfw.wmnet (PASS)
    • Host wikikube-worker[2228-2231].codfw.wmnet pooled in wikikube-codfw

Change #1111926 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] Rename mw235[0-3] to wikikube-worker223[2-5]

https://gerrit.wikimedia.org/r/1111926

depool host mw[2350-2353].codfw.wmnet by jelto@cumin1002 with reason: Renaming nodes

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 depool for host mw[2350-2353].codfw.wmnet completed:

  • mw[2350-2353].codfw.wmnet (PASS)
    • Host mw[2350-2353].codfw.wmnet depooled from wikikube-codfw

Change #1111926 merged by Jelto:

[operations/puppet@production] Rename mw235[0-3] to wikikube-worker223[2-5]

https://gerrit.wikimedia.org/r/1111926

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2350 to wikikube-worker2232 completed:

  • mw2350 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2351 to wikikube-worker2233 completed:

  • mw2351 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2352 to wikikube-worker2234 completed:

  • mw2352 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2353 to wikikube-worker2235 completed:

  • mw2353 (WARN)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ⚠️Rollback initiated but nothing to rollback (too soon or too late).⚠️

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2353 to wikikube-worker2235 completed:

  • mw2353 (WARN)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ⚠️Rollback initiated but nothing to rollback (too soon or too late).⚠️
    • ✔️ Netbox rolled back
    • ⚠️Renaming failed but rollback succedded⚠️ Please check the logs for the reason and follow up with I/F if needed.Neither puppet nor alerting were re-enabled.

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2353 to wikikube-worker2235 completed:

  • mw2353 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.k8s.renumber-node was started by jelto@cumin1002 Renumbering for host wikikube-worker2232.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2232.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.k8s.renumber-node was started by jelto@cumin1002 Renumbering for host wikikube-worker2233.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2233.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.k8s.renumber-node was started by jelto@cumin1002 Renumbering for host wikikube-worker2234.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2234.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.k8s.renumber-node was started by jelto@cumin1002 Renumbering for host wikikube-worker2235.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2235.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2232.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2232.codfw.wmnet (PASS)
  • wikikube-worker2232 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161054_jelto_3477300_wikikube-worker2232.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by jelto@cumin1002 Renumbering for host wikikube-worker2232.codfw.wmnet completed:

  • wikikube-worker2232.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2232.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to confirm homer commands
  • wikikube-worker2232 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161054_jelto_3477300_wikikube-worker2232.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2232.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2233.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2233.codfw.wmnet (PASS)
  • wikikube-worker2233 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161101_jelto_3477906_wikikube-worker2233.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by jelto@cumin1002 Renumbering for host wikikube-worker2233.codfw.wmnet completed:

  • wikikube-worker2233.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2233.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to confirm homer commands
  • wikikube-worker2233 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161101_jelto_3477906_wikikube-worker2233.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2233.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2234.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2234.codfw.wmnet (PASS)
  • wikikube-worker2234 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161110_jelto_3478540_wikikube-worker2234.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by jelto@cumin1002 Renumbering for host wikikube-worker2234.codfw.wmnet completed:

  • wikikube-worker2234.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2234.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to confirm homer commands
  • wikikube-worker2234 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161110_jelto_3478540_wikikube-worker2234.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2234.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2235.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2235.codfw.wmnet (PASS)
  • wikikube-worker2235 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161117_jelto_3479188_wikikube-worker2235.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by jelto@cumin1002 Renumbering for host wikikube-worker2235.codfw.wmnet completed:

  • wikikube-worker2235.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2235.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to confirm homer commands
  • wikikube-worker2235 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161117_jelto_3479188_wikikube-worker2235.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2235.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2232.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2232 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161142_jelto_3487229_wikikube-worker2232.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2233.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2233 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161145_jelto_3488906_wikikube-worker2233.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2234.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2234 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161153_jelto_3491381_wikikube-worker2234.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2235.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2235 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161202_jelto_3491993_wikikube-worker2235.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2025-01-16T12:23:38Z] <jelto> homer 'lsw1-c6-codfw*' commit 'T377877'

pool host wikikube-worker[2232-2235].codfw.wmnet by jelto@cumin1002 with reason: None

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 pool for host wikikube-worker[2232-2235].codfw.wmnet completed:

  • wikikube-worker[2232-2235].codfw.wmnet (PASS)
    • Host wikikube-worker[2232-2235].codfw.wmnet pooled in wikikube-codfw

Change #1111965 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] Rename mw233[5-8] to wikikube-worker223[6-9]

https://gerrit.wikimedia.org/r/1111965

depool host mw[2335-2338].codfw.wmnet by jelto@cumin1002 with reason: Renaming nodes

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 depool for host mw[2335-2338].codfw.wmnet completed:

  • mw[2335-2338].codfw.wmnet (PASS)
    • Host mw[2335-2338].codfw.wmnet depooled from wikikube-codfw

Change #1111965 merged by Jelto:

[operations/puppet@production] Rename mw233[5-8] to wikikube-worker223[6-9]

https://gerrit.wikimedia.org/r/1111965

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2335 to wikikube-worker2236 completed:

  • mw2335 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2336 to wikikube-worker2237 completed:

  • mw2336 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2337 to wikikube-worker2238 completed:

  • mw2337 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2338 to wikikube-worker2239 completed:

  • mw2338 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2236.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2237.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2238.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2239.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2236.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2236 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161406_jelto_3516356_wikikube-worker2236.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2237.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2237 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161414_jelto_3517131_wikikube-worker2237.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2238.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2238 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2238.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2238.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2239.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2239 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161439_jelto_3523349_wikikube-worker2239.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2238.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2238 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501161509_jelto_3535576_wikikube-worker2238.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2025-01-16T15:31:13Z] <jelto> homer 'lsw1-c3-codfw*' commit 'T377877'

pool host wikikube-worker[2236-2239].codfw.wmnet by jelto@cumin1002 with reason: None

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 pool for host wikikube-worker[2236-2239].codfw.wmnet completed:

  • wikikube-worker[2236-2239].codfw.wmnet (PASS)
    • Host wikikube-worker[2236-2239].codfw.wmnet pooled in wikikube-codfw

Change #1112055 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] Rename the remaining mw nodes to wikikube-worker224[0-2] 🥳

https://gerrit.wikimedia.org/r/1112055

depool host mw[2282,2310-2311].codfw.wmnet by jelto@cumin1002 with reason: Renaming nodes

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 depool for host mw[2282,2310-2311].codfw.wmnet completed:

  • mw[2282,2310-2311].codfw.wmnet (PASS)
    • Host mw[2282,2310-2311].codfw.wmnet depooled from wikikube-codfw

Change #1112055 merged by Jelto:

[operations/puppet@production] Rename the remaining mw nodes to wikikube-worker224[0-2] 🥳

https://gerrit.wikimedia.org/r/1112055

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2310 to wikikube-worker2240 completed:

  • mw2310 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from mw2311 to wikikube-worker2241 completed:

  • mw2311 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Change #1112173 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] remove reserved name wikikube-worker2242 because of mw2282 decom

https://gerrit.wikimedia.org/r/1112173

Change #1112173 merged by Jelto:

[operations/puppet@production] remove reserved name wikikube-worker2242 because of mw2282 decom

https://gerrit.wikimedia.org/r/1112173

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2240.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2241.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2241.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2241 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2241.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker2241.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2240.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2240 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501170959_jelto_3701258_wikikube-worker2240.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker2241.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2241 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202501171031_jelto_3705790_wikikube-worker2241.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2025-01-17T10:54:24Z] <jelto> homer 'lsw1-b3-codfw*' commit 'T377877'

pool host wikikube-worker[2240-2241].codfw.wmnet by jelto@cumin1002 with reason: None

Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 pool for host wikikube-worker[2240-2241].codfw.wmnet completed:

  • wikikube-worker[2240-2241].codfw.wmnet (PASS)
    • Host wikikube-worker[2240-2241].codfw.wmnet pooled in wikikube-codfw
Jelto claimed this task.
Jelto added subscribers: Raine, Jelto.

All wikikube-worker in codfw are on bookworm, containerd and in the new VLANs! Thanks to @JMeybohm, @kamila and @Clement_Goubert for the support!

A short summary (just from the SAL/Phab logs):

  • ~236 nodes have been reimaged (at least tracked in this task)
  • 37 reimages failed, 11 other FAILS happened due to wrong cookbook usage or netbox timeouts
  • 2 hosts had hardware issues during the reimage, 1 host was too old (T383965)