Page MenuHomePhabricator

[openstack] Upgrade eqiad hosts to bookworm
Closed, ResolvedPublic

Description

This is required before we can upgrade OpenStack from Zed to Antelope, as Antelope packages are only available for Bookworm:

This is the list of hosts with OpenStack packages:

fnegri@cloudcumin1001:~$ sudo cumin 'P{C:openstack::serverpackages::zed::bullseye}'
50 hosts will be targeted:
cloudcontrol[1005-1007].eqiad.wmnet,cloudnet[1005-1006].eqiad.wmnet,cloudrabbit[1001-1003].wikimedia.org,cloudservices[1005-1006].eqiad.wmnet,cloudvirt[1025-1061].eqiad.wmnet,cloudvirtlocal[1001-1003].eqiad.wmnet
  • cloudcontrol[1005-1007].eqiad.wmnet
  • cloudnet[1005-1006].eqiad.wmnet
  • cloudrabbit[1001-1003].wikimedia.org
  • cloudservices[1005-1006].eqiad.wmnet
  • cloudvirt[1025-1061].eqiad.wmnet
  • cloudvirtlocal[1001-1003].eqiad.wmnet

Note: cloudvirt-wdqs[1001-1003].eqiad.wmnet have already been reimaged to bookworm in T346948: Move cloudvirt-wdqs hosts.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T04:14:21Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T04:15:18Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) (T345811)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1043 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1042.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1042 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311140404_andrew_299820_cloudvirt1042.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T05:02:28Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T05:03:25Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1045.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1043 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1044 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1045.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1045 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311140520_andrew_335435_cloudvirt1045.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1044 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T11:44:44Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T12:03:11Z] <wm-bot2> fran@wmf3169 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1046.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1046 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1044 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1043 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1044 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311141617_andrew_627081_cloudvirt1044.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1046 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1043 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T17:24:16Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1047.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T17:24:56Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1047.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T17:39:48Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1047.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T17:43:13Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1048.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:01:19Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1047.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:02:30Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1047.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:03:45Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1047.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1047.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:06:23Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1049.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:10:32Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1048.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1048.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:31:07Z] <andrew@cloudcumin1001> END (ERROR) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=97) on host 'cloudvirt1049.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1046 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1047.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1047 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311141822_andrew_682448_cloudvirt1047.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1048.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1048 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311141832_andrew_685030_cloudvirt1048.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1049.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T18:57:07Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1050.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:00:03Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1052.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:17:25Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1050.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1050.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:18:11Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1052.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:21:51Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1049.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1049 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311141915_andrew_709401_cloudvirt1049.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1052.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:47:24Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:49:04Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:50:03Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:54:29Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:55:18Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:58:13Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T19:59:32Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1050.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1050 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311141936_andrew_718619_cloudvirt1050.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:00:10Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:05:50Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:06:34Z] <andrew@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:09:40Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:10:37Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1053.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1053.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:12:44Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1055.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:15:35Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1054.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1052.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1052 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311142001_andrew_730619_cloudvirt1052.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1054.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:26:33Z] <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1056.eqiad.wmnet' (T345811)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T20:28:48Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1055.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1055.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1053.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1053 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311142028_andrew_745083_cloudvirt1053.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1054.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1054 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311142042_andrew_754670_cloudvirt1054.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-11-14T21:08:58Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1056.eqiad.wmnet' (T345811)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1056.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1055.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1055 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311142047_andrew_756482_cloudvirt1055.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1056.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1056 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311142125_andrew_780009_cloudvirt1056.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bookworm executed with errors:

  • cloudvirt1046 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm completed:

  • cloudvirtlocal1001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311271510_andrew_384354_cloudvirtlocal1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm completed:

  • cloudvirtlocal1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311271725_andrew_447436_cloudvirtlocal1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm completed:

  • cloudvirtlocal1003 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311271828_andrew_475451_cloudvirtlocal1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Andrew updated the task description. (Show Details)