Page MenuHomePhabricator

upgrade cloudnet servers to Debian 11 Bullseye
Closed, ResolvedPublic

Description

This ticket is to track all work related to upgrading cloudnet servers to Debian 11 Bullseye

  • verify codfw1dev cloudnet servers were upgraded already
  • verify codfw1dev network works before operations
  • verify eqiad1 network works before operations

Event Timeline

Change 772818 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: networktests: discard even more hostkey checking stuff

https://gerrit.wikimedia.org/r/772818

For the record:

[2022-03-22 12:02:51] INFO: --- cloudcontrol1003 Debian GNU/Linux 11 (bullseye) 5.10.0-11-amd64
[2022-03-22 12:02:51] INFO: ---
[2022-03-22 12:02:51] INFO: running: basic ping to cloudgw addresses (raw addresses) from outside the cloud network
[2022-03-22 12:02:51] INFO: running: basic ping to cloudgw addresses (DNS names) from outside the cloud network
[2022-03-22 12:02:51] INFO: running: basic ping to neutron WAN from outside the cloud network
[2022-03-22 12:02:51] INFO: running: basic ping to neutron VIRT gateway from within the cloud virtual network, no floating IP
[2022-03-22 12:02:55] INFO: running: basic ping to neutron VIRT gateway from within the cloud virtual network, with floating IP
[2022-03-22 12:02:58] INFO: running: VM (no floating IP) contacting the internet gets NAT'd using routing_source_ip
[2022-03-22 12:02:59] INFO: running: VM (no floating IP) contacting an address covered by dmz_cidr doesn't get NAT'd
[2022-03-22 12:03:01] INFO: running: VM (using floating IP) isn't affected by either routing_source_ip or dmz_cidr
[2022-03-22 12:03:06] INFO: running: VM (no floating IP) can contact auth DNS server
[2022-03-22 12:03:07] INFO: running: VM (no floating IP) can contact recursor DNS server
[2022-03-22 12:03:09] INFO: running: VM (using floating IP) can contact auth DNS server
[2022-03-22 12:03:10] INFO: running: VM (using floating IP) can contact recursor DNS server
[2022-03-22 12:03:11] INFO: running: VM (using floating IP) can contact LDAP server
[2022-03-22 12:03:13] INFO: running: VM (not using floating IP) can contact LDAP server
[2022-03-22 12:03:14] INFO: running: VM (using floating IP) can contact openstack API
[2022-03-22 12:03:16] INFO: running: VM (no floating IP) can contact openstack API
[2022-03-22 12:03:17] INFO: running: puppetmasters can sync git tree
[2022-03-22 12:03:24] INFO: running: VM (using floating IP) can read dumps NFS
[2022-03-22 12:03:26] INFO: running: VM (no floating IP) can read dumps NFS
[2022-03-22 12:03:28] INFO: running: VM (using floating IP) can connect to wikireplicas from Toolforge
[2022-03-22 12:03:36] INFO: running: Toolforge webservice can be accessed from the internet
[2022-03-22 12:03:36] INFO: running: Toolforge bastions see herald file on project NFS
[2022-03-22 12:03:40] INFO: ---
[2022-03-22 12:03:40] INFO: --- passed tests: 22
[2022-03-22 12:03:40] INFO: --- failed tests: 0
[2022-03-22 12:03:40] INFO: --- total tests: 22

and:

[2022-03-22 12:40:58] INFO: --- cloudcontrol2001-dev Debian GNU/Linux 11 (bullseye) 5.10.0-11-amd64
[2022-03-22 12:40:58] INFO: ---
[2022-03-22 12:40:58] INFO: running: basic ping to cloudgw addresses (raw addresses) from outside the cloud network
[2022-03-22 12:40:58] INFO: running: basic ping to cloudgw addresses (DNS names) from outside the cloud network
[2022-03-22 12:40:58] INFO: running: basic ping to neutron WAN from outside the cloud network
[2022-03-22 12:40:58] INFO: running: basic ping to neutron VIRT gateway from within the cloud virtual network, no floating IP
[2022-03-22 12:41:02] INFO: running: basic ping to neutron VIRT gateway from within the cloud virtual network, with floating IP
[2022-03-22 12:41:06] INFO: running: VM (no floating IP) contacting the internet gets NAT'd using routing_source_ip
[2022-03-22 12:41:08] INFO: running: VM (no floating IP) contacting an address covered by dmz_cidr doesn't get NAT'd
[2022-03-22 12:41:10] INFO: running: VM (using floating IP) isn't affected by either routing_source_ip or dmz_cidr
[2022-03-22 12:41:13] INFO: running: VM (no floating IP) can contact auth DNS server
[2022-03-22 12:41:16] INFO: running: VM (no floating IP) can contact recursor DNS server
[2022-03-22 12:41:18] INFO: running: VM (using floating IP) can contact auth DNS server
[2022-03-22 12:41:19] INFO: running: VM (using floating IP) can contact recursor DNS server
[2022-03-22 12:41:21] INFO: running: VM (using floating IP) can contact LDAP server
[2022-03-22 12:41:23] INFO: running: VM (not using floating IP) can contact LDAP server
[2022-03-22 12:41:25] INFO: running: VM (using floating IP) can contact openstack API
[2022-03-22 12:41:26] INFO: running: VM (no floating IP) can contact openstack API
[2022-03-22 12:41:28] INFO: running: puppetmasters can sync git tree
[2022-03-22 12:41:36] INFO: running: VM (using floating IP) can read dumps NFS
[2022-03-22 12:41:39] INFO: running: VM (no floating IP) can read dumps NFS
[2022-03-22 12:41:41] INFO: ---
[2022-03-22 12:41:41] INFO: --- passed tests: 19
[2022-03-22 12:41:41] INFO: --- failed tests: 0
[2022-03-22 12:41:41] INFO: --- total tests: 19

Change 772818 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: networktests: discard even more hostkey checking stuff

https://gerrit.wikimedia.org/r/772818

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye executed with errors:

  • cloudnet1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203221506_andrew_2692073_cloudnet1003.out
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye executed with errors:

  • cloudnet1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye executed with errors:

  • cloudnet1003 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203221607_andrew_2700664_cloudnet1003.out
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudnet1003.eqiad.wmnet with OS bullseye completed:

  • cloudnet1003 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203221617_andrew_2701269_cloudnet1003.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudnet1004.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudnet1004.eqiad.wmnet with OS bullseye completed:

  • cloudnet1004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203221646_andrew_2706223_cloudnet1004.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB