Page MenuHomePhabricator

Complete upgrading WMCS bare metal hosts to Trixie
Closed, ResolvedPublic

Description

We still have many WMCS hosts running Debian Bullseye or Bookworm. We should slowly upgrade all of them to Trixie. Bullseye reaches end of life on 31st Aug 2026

Please create sub-tasks for the various host groups.

root@cumin1002:~# cumin cloud* 'grep -i version= /etc/os-release'
151 hosts will be targeted:
cloudbackup[1001-1002]-dev.eqiad.wmnet,cloudbackup[2003-2004].codfw.wmnet,cloudbackup[1003-1004].eqiad.wmnet,cloudcephm
on[2004-2006]-dev.codfw.wmnet,cloudcephmon[1004-1006].eqiad.wmnet,cloudcephosd[2004-2007]-dev.codfw.wmnet,cloudcephosd[1016-1024,1026-1052].eqiad.wmnet,cloudcontrol[2005-2006,2010]-dev.codfw.wmnet,cloudcontrol[1006-1007,1011].eqiad.wmnet,cloudcumin2001.codfw.wmnet,cloudcumin1001.eqiad.wmnet,clouddb[1013-1020,1022-1025].eqiad.wmnet,clouddumps[1001-1002].wikimedia.org,cloudelastic[1007-1012].eqiad.wmnet,cloudgw[2002-2003]-dev.codfw.wmnet,cloudgw[1003-1004].eqiad.wmnet,cloudlb[2002-2004]-dev.codfw.wmnet,cloudlb[1001-1002].eqiad.wmnet,cloudnet[2005-2008]-dev.codfw.wmnet,cloudnet[1005-1006].eqiad.wmnet,cloudrabbit[2001-2003]-dev.codfw.wmnet,cloudrabbit[1001-1003].eqiad.wmnet,cloudservices[2004-2005]-dev.codfw.wmnet,cloudservices[1005-1006].eqiad.wmnet,cloudvirt[2004-2006]-dev.codfw.wmnet,cloudvirt[1040-1076].eqiad.wmnet,cloudvirtlocal[1001-1003].eqiad.wmnet,cloudweb2002-dev.wikimedia.org,cloudweb[1003-1004].wikimedia.org                       OK to proceed on 151 hosts? Enter the number of affected hosts to confirm or "q" to quit: 151
===== NODE GROUP =====                                                                                                 
(61) cloudcephmon[2004-2006]-dev.codfw.wmnet,cloudcephmon[1005-1006].eqiad.wmnet,cloudcephosd[2005-2007]-dev.codfw.wmnet,cloudcephosd[1016-1024,1026-1052].eqiad.wmnet,cloudcumin2001.codfw.wmnet,cloudcumin1001.eqiad.wmnet,clouddumps[1001-1
002].wikimedia.org,cloudelastic[1007-1012].eqiad.wmnet,cloudgw[2002-2003]-dev.codfw.wmnet,cloudgw[1003-1004].eqiad.wmnet,cloudweb2002-dev.wikimedia.org,cloudweb[1003-1004].wikimedia.org                                                     ----- OUTPUT of 'grep -i version= /etc/os-release' -----                                                               
VERSION="11 (bullseye)"                                                                                                
===== NODE GROUP =====                                                                                                 
(90) cloudbackup[1001-1002]-dev.eqiad.wmnet,cloudbackup[2003-2004].codfw.wmnet,cloudbackup[1003-1004].eqiad.wmnet,cloudcephmon1004.eqiad.wmnet,cloudcephosd2004-dev.codfw.wmnet,cloudcontrol[2005-2006,2010]-dev.codfw.wmnet,cloudcontrol[1006
-1007,1011].eqiad.wmnet,clouddb[1013-1020,1022-1025].eqiad.wmnet,cloudlb[2002-2004]-dev.codfw.wmnet,cloudlb[1001-1002].eqiad.wmnet,cloudnet[2005-2008]-dev.codfw.wmnet,cloudnet[1005-1006].eqiad.wmnet,cloudrabbit[2001-2003]-dev.codfw.wmnet,cloudrabbit[1001-1003].eqiad.wmnet,cloudservices[2004-2005]-dev.codfw.wmnet,cloudservices[1005-1006].eqiad.wmnet,cloudvirt[2004-2006]-dev.codfw.wmnet,cloudvirt[1040-1076].eqiad.wmnet,cloudvirtlocal[1001-1003].eqiad.wmnet                  ----- OUTPUT of 'grep -i version= /etc/os-release' -----                                                               
VERSION="12 (bookworm)"                                                                                                
================

Related Objects

StatusSubtypeAssignedTask
ResolvedAndrew
Resolved taavi
ResolvedAndrew
Resolvedjijiki
Resolvedjijiki
Resolvedjijiki
Resolvedjijiki
Resolvedjijiki
Resolvedjijiki
Resolvedjijiki
Resolvedjijiki
ResolvedLadsgroup
Resolvedjijiki
ResolvedSLyngshede-WMF
ResolvedSLyngshede-WMF
Resolvedjijiki
Resolvedjijiki
ResolvedAndrew
Resolved taavi
OpenNone
OpenBUG REPORTNone
OpenNone
StalledNone
DuplicateAndrew
ResolvedMoritzMuehlenhoff

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-cloud-feed) [2025-12-04T20:23:53Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-12-04T21:20:15Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Change #1223268 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudbackup: cram all backups onto cloudbackup1004 so 1003 can be reimaged.

https://gerrit.wikimedia.org/r/1223268

Change #1223268 merged by Andrew Bogott:

[operations/puppet@production] cloudbackup: cram all backups onto cloudbackup1004 so 1003 can be reimaged.

https://gerrit.wikimedia.org/r/1223268

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T22:32:17Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T22:39:18Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T22:49:43Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T22:51:08Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T22:59:44Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T23:01:57Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T23:16:22Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T23:18:34Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-05T23:20:57Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T00:54:56Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T00:57:44Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:13:37Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:14:33Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:18:15Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:22:58Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:28:53Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:33:20Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:37:11Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:43:49Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T01:52:54Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T02:11:56Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T02:16:32Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T02:20:48Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T02:39:45Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T02:47:07Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T02:58:06Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T03:20:16Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T13:47:45Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T13:54:44Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T14:39:44Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-06T14:57:34Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudnet1006.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudnet1006.eqiad.wmnet with OS trixie completed:

  • cloudnet1006 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601061628_andrew_1112267_cloudnet1006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie executed with errors:

  • cloudbackup2003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie executed with errors:

  • cloudbackup2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1224798 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudbackups: update partman recipes.

https://gerrit.wikimedia.org/r/1224798

Change #1224798 merged by Andrew Bogott:

[operations/puppet@production] cloudbackups: update partman recipes.

https://gerrit.wikimedia.org/r/1224798

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie executed with errors:

  • cloudbackup2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and the operator aborted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie executed with errors:

  • cloudbackup2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie executed with errors:

  • cloudbackup2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie executed with errors:

  • cloudbackup2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601082203_andrew_2727225_cloudbackup2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2003.codfw.wmnet with OS trixie completed:

  • cloudbackup2003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601090328_andrew_2888147_cloudbackup2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1003.eqiad.wmnet with OS trixie completed:

  • cloudbackup1003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601091454_andrew_3230765_cloudbackup1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1224974 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs cinder backups: move all backups to 2003 so 2004 can be reimaged

https://gerrit.wikimedia.org/r/1224974

Change #1224975 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudbackup: flip all backups from cloudbackup1004 to 1003

https://gerrit.wikimedia.org/r/1224975

Change #1224974 merged by Andrew Bogott:

[operations/puppet@production] wmcs cinder backups: move all backups to 2003 so 2004 can be reimaged

https://gerrit.wikimedia.org/r/1224974

Change #1224975 merged by Andrew Bogott:

[operations/puppet@production] cloudbackup: flip all backups from cloudbackup1004 to 1003

https://gerrit.wikimedia.org/r/1224975

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudlb1002.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudlb1002.eqiad.wmnet with OS trixie completed:

  • cloudlb1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601121615_andrew_1213428_cloudlb1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudlb1001.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudlb1001.eqiad.wmnet with OS trixie completed:

  • cloudlb1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601121839_andrew_1283998_cloudlb1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudgw2002-dev.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudgw2002-dev.codfw.wmnet with OS trixie completed:

  • cloudgw2002-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601141359_andrew_2577069_cloudgw2002-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudgw2003-dev.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudgw2003-dev.codfw.wmnet with OS trixie completed:

  • cloudgw2003-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601141609_andrew_2640472_cloudgw2003-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup2004.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup2004.codfw.wmnet with OS trixie completed:

  • cloudbackup2004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601141749_andrew_2691522_cloudbackup2004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1004.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1004.eqiad.wmnet with OS trixie executed with errors:

  • cloudbackup1004 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudbackup1004.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudbackup1004.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudbackup1004.eqiad.wmnet with OS trixie completed:

  • cloudbackup1004 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601142043_andrew_2783194_cloudbackup1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

This is done apart from Ceph hosts, which are tracked elsewhere.