Page MenuHomePhabricator

Upgrade cloud-vps openstack hosts to Debian 'Bullseye'
Closed, ResolvedPublic

Description

We are currently running OpenStack version 'Victoria'. Victoria is the release that is packaged for both Buster and Bullseye; for future OpenStack upgrades we will need our control plane on Bullseye.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolvedrook
ResolvedAndrew
ResolvedAndrew
Resolvedtaavi
Resolvedaborrero
Resolvedaborrero
Resolveddcaro
Resolveddcaro
Opendcaro
Opendcaro
Duplicatedcaro
Resolveddcaro
Resolvedtaavi
Resolveddcaro
Resolvedaborrero
Resolveddcaro
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
Resolvedayounsi

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:18:40Z] <wm-bot> Set cloudvirt 'cloudvirt1030.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:19:23Z] <wm-bot> Draining 'cloudvirt1031.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:20:08Z] <wm-bot> Set cloudvirt 'cloudvirt1031.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:32:55Z] <wm-bot> Drained 'cloudvirt1030.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1030.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:34:24Z] <wm-bot> Draining 'cloudvirt1032.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:35:09Z] <wm-bot> Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:44:48Z] <wm-bot> Drained 'cloudvirt1031.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1031.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T14:57:46Z] <wm-bot> Draining 'cloudvirt1032.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1030.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1030 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231433_andrew_2889456_cloudvirt1030.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T15:00:40Z] <wm-bot> Set cloudvirt 'cloudvirt1032.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T15:01:48Z] <wm-bot> Drained 'cloudvirt1032.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1031.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1031 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231445_andrew_2892140_cloudvirt1031.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1032.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1032.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1032 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231550_andrew_2903853_cloudvirt1032.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T16:36:29Z] <wm-bot> Draining 'cloudvirt1033.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T16:36:38Z] <wm-bot> Draining 'cloudvirt1034.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T16:37:14Z] <wm-bot> Set cloudvirt 'cloudvirt1033.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T16:37:22Z] <wm-bot> Set cloudvirt 'cloudvirt1034.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1047.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T16:51:05Z] <wm-bot> Drained 'cloudvirt1033.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1028.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1047.eqiad.wmnet with OS bullseye executed with errors:

  • cloudvirt1047 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1033.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T17:03:12Z] <wm-bot> Drained 'cloudvirt1034.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T17:03:43Z] <wm-bot> Draining 'cloudvirt1035.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T17:04:29Z] <wm-bot> Set cloudvirt 'cloudvirt1035.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1034.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1028.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1028 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231658_andrew_2915140_cloudvirt1028.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1033.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1033 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231659_andrew_2915254_cloudvirt1033.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1034.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1034 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231707_andrew_2917902_cloudvirt1034.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T17:54:33Z] <wm-bot> Draining 'cloudvirt1036.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T17:55:18Z] <wm-bot> Set cloudvirt 'cloudvirt1036.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T18:13:17Z] <wm-bot> Drained 'cloudvirt1036.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T18:18:20Z] <wm-bot> Draining 'cloudvirt1037.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T18:19:04Z] <wm-bot> Set cloudvirt 'cloudvirt1037.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1035.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1036.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T18:43:54Z] <wm-bot> Draining 'cloudvirt1038.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T18:44:39Z] <wm-bot> Set cloudvirt 'cloudvirt1038.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1035.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1035 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231836_andrew_2933834_cloudvirt1035.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1036.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1036 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203231836_andrew_2933843_cloudvirt1036.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1037.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1038.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:14:50Z] <wm-bot> Draining 'cloudvirt1039.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:14:57Z] <wm-bot> Draining 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:15:34Z] <wm-bot> Set cloudvirt 'cloudvirt1039.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:15:43Z] <wm-bot> Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:30:30Z] <wm-bot> Drained 'cloudvirt1039.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1039.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1038.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1038 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232014_andrew_2948775_cloudvirt1038.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1037.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1037 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232014_andrew_2948781_cloudvirt1037.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:54:42Z] <wm-bot> Draining 'cloudvirt1041.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T20:55:29Z] <wm-bot> Set cloudvirt 'cloudvirt1041.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:04:41Z] <wm-bot> Draining 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:07:37Z] <wm-bot> Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1039.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1039 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232031_andrew_2950344_cloudvirt1039.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:09:11Z] <wm-bot> Draining 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:12:05Z] <wm-bot> Set cloudvirt 'cloudvirt1040.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:12:09Z] <wm-bot> Drained 'cloudvirt1040.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1040.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:19:08Z] <wm-bot> Draining 'cloudvirt1042.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:19:53Z] <wm-bot> Set cloudvirt 'cloudvirt1042.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T21:54:23Z] <wm-bot> Drained 'cloudvirt1042.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1040.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1040 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232118_andrew_2962276_cloudvirt1040.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1041.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1042.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:06:20Z] <wm-bot> Draining 'cloudvirt1043.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:06:25Z] <wm-bot> Draining 'cloudvirt1044.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:07:11Z] <wm-bot> Set cloudvirt 'cloudvirt1044.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:08:10Z] <wm-bot> Set cloudvirt 'cloudvirt1043.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:12:04Z] <wm-bot> Draining 'cloudvirt1045.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:12:50Z] <wm-bot> Set cloudvirt 'cloudvirt1045.eqiad.wmnet' maintenance. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:38:55Z] <wm-bot> Drained 'cloudvirt1044.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1041.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1041 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232205_andrew_2970731_cloudvirt1041.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1042.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1042 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232205_andrew_2970762_cloudvirt1042.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud) [2022-03-23T22:53:55Z] <wm-bot> Drained 'cloudvirt1045.eqiad.wmnet'. (T281276) - cookbook ran by andrew@buster

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1043 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232235_andrew_2977736_cloudvirt1043.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1045.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1045.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1045 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232348_andrew_2986471_cloudvirt1045.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1044 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232348_andrew_2986467_cloudvirt1044.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1046 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203232351_andrew_2986809_cloudvirt1046.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye executed with errors:

  • cloudvirt-wdqs1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Quick update: The only hosts remaining to upgrade to bullseye are the toolsdb-hosting hypervisors (cloudvirt1019 and 1020) and cloudvirt-wdqs1xxx hosts.

I'm waiting for a consult with data-persistence about the toolsdb hosts.

The cloudvirt-wdqs services are just waiting for a job to complete that's running on one of the hosted VMs.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1020.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1020 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204181525_andrew_2870264_cloudvirt1020.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudvirt1019.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudvirt1019.eqiad.wmnet with OS bullseye completed:

  • cloudvirt1019 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204191534_andrew_3615519_cloudvirt1019.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB