Page MenuHomePhabricator

Migrate x3 section to Debian Trixie
Closed, ResolvedPublic

Description

  • db2244
  • db2243
  • db2242
  • db2241 T426936
  • db2200 backup source T424541
  • db2187
  • db2162
  • db1258
  • db1257
  • db1256
  • db1255 master
  • db1216 backup source T424541
  • db1211 sanitarium master
  • db1154 sanitarium
  • clouddb1023 T415165
  • clouddb1022 T415165
  • clouddb1020 T415165
  • clouddb1016 T415165
sudo cookbook sre.mysql.major-upgrade -t T426725 --reimage trixie --repool $HOST wmf-mariadb1011

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Completed depooling of db2243 by cwilliams@cumin1003: Upgrading db2243.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db2243.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db2243.codfw.wmnet with OS trixie completed:

  • db2243 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605191444_cwilliams_1534026_db2243.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db2243 by cwilliams@cumin1003: Migration of db2243.codfw.wmnet completed

Completed pooling of db2243 by cwilliams@cumin1003: Migration of db2243.codfw.wmnet completed

Migration of db2243.codfw.wmnet completed

Completed depooling of db2187 by cwilliams@cumin1003: Upgrading db2187.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db2187.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db2187.codfw.wmnet with OS trixie completed:

  • db2187 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605200807_cwilliams_1820489_db2187.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db2187 by cwilliams@cumin1003: Migration of db2187.codfw.wmnet completed

Completed depooling of db2242 by cwilliams@cumin1003: Upgrading db2242.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db2242.codfw.wmnet with OS trixie

Completed pooling of db2187 by cwilliams@cumin1003: Migration of db2187.codfw.wmnet completed

Migration of db2187.codfw.wmnet completed

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db2242.codfw.wmnet with OS trixie completed:

  • db2242 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605200924_cwilliams_1890620_db2242.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db2242 by cwilliams@cumin1003: Migration of db2242.codfw.wmnet completed

Completed pooling of db2242 by cwilliams@cumin1003: Migration of db2242.codfw.wmnet completed

Migration of db2242.codfw.wmnet completed

CWilliams-WMF changed the task status from Open to In Progress.Wed, May 20, 10:37 AM
CWilliams-WMF updated the task description. (Show Details)

Completed depooling of db2162 by cwilliams@cumin1003: Upgrading db2162.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db2162.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db2162.codfw.wmnet with OS trixie completed:

  • db2162 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605201104_cwilliams_1951032_db2162.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db2162 by cwilliams@cumin1003: Migration of db2162.codfw.wmnet completed

Completed depooling of db1258 by cwilliams@cumin1003: Upgrading db1258.eqiad.wmnet

Completed pooling of db2162 by cwilliams@cumin1003: Migration of db2162.codfw.wmnet completed

Migration of db2162.codfw.wmnet completed

Upgrading db1258.eqiad.wmnet

This ended up in a broken state as the management password that was entered was incorrect.
See https://phabricator.wikimedia.org/P92681 for related output.

Remedied manually, restarted the replica so that the cookbook could run and now proceeding with the reimage

Completed depooling of db1258 by cwilliams@cumin1003: Upgrading db1258.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db1258.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db1258.eqiad.wmnet with OS trixie completed:

  • db1258 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605201259_cwilliams_2035922_db1258.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db1258 by cwilliams@cumin1003: Migration of db1258.eqiad.wmnet completed

Completed pooling of db1258 by cwilliams@cumin1003: Migration of db1258.eqiad.wmnet completed

Migration of db1258.eqiad.wmnet completed

Completed depooling of db1257 by cwilliams@cumin1003: Upgrading db1257.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db1257.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db1257.eqiad.wmnet with OS trixie completed:

  • db1257 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605201508_cwilliams_2118406_db1257.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db1257 by cwilliams@cumin1003: Migration of db1257.eqiad.wmnet completed

Completed pooling of db1257 by cwilliams@cumin1003: Migration of db1257.eqiad.wmnet completed

Migration of db1257.eqiad.wmnet completed

Completed depooling of db1256 by cwilliams@cumin1003: Upgrading db1256.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db1256.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db1256.eqiad.wmnet with OS trixie completed:

  • db1256 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605210743_cwilliams_2327330_db1256.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db1256 by cwilliams@cumin1003: Migration of db1256.eqiad.wmnet completed

Completed pooling of db1256 by cwilliams@cumin1003: Migration of db1256.eqiad.wmnet completed

Migration of db1256.eqiad.wmnet completed

Completed depooling of db2241 by cwilliams@cumin1003: Upgrading db2241.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db2241.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db2241.codfw.wmnet with OS trixie completed:

  • db2241 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605211344_cwilliams_2728357_db2241.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db2241 by cwilliams@cumin1003: Migration of db2241.codfw.wmnet completed

Completed pooling of db2241 by cwilliams@cumin1003: Migration of db2241.codfw.wmnet completed

Migration of db2241.codfw.wmnet completed

Icinga downtime and Alertmanager silence (ID=6e84fd59-a589-467f-8a5e-0390a870ff03) set by cwilliams@cumin1003 for 3:00:00 on 3 host(s) and their services with reason: Reimaging upstream server

clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=af05fe7f-cadf-4b93-ae0d-e10a72afaae6) set by cwilliams@cumin1003 for 3:00:00 on 2 host(s) and their services with reason: Reimaging upstream server

clouddb[1022-1023].eqiad.wmnet

Completed depooling of db1211 by cwilliams@cumin1003: Upgrading db1211.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db1211.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db1211.eqiad.wmnet with OS trixie completed:

  • db1211 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030824_cwilliams_537045_db1211.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db1211 by cwilliams@cumin1003: Migration of db1211.eqiad.wmnet completed

Completed pooling of db1211 by cwilliams@cumin1003: Migration of db1211.eqiad.wmnet completed

Migration of db1211.eqiad.wmnet completed

Completed depooling of db1255 by cwilliams@cumin1003: Upgrading db1255.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by cwilliams@cumin1003 for host db1255.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cwilliams@cumin1003 for host db1255.eqiad.wmnet with OS trixie completed:

  • db1255 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606040635_cwilliams_1028532_db1255.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Starting pool of db1255 by cwilliams@cumin1003: Migration of db1255.eqiad.wmnet completed

Completed pooling of db1255 by cwilliams@cumin1003: Migration of db1255.eqiad.wmnet completed

Migration of db1255.eqiad.wmnet completed

The remaining hosts are out of scope for this ticket:

% sudo cumin 'A:db-section-x3 and A:bookworm'
6 hosts will be targeted:
clouddb[1016,1020,1022-1023].eqiad.wmnet,db2200.codfw.wmnet,db1216.eqiad.wmnet
DRY-RUN mode enabled, aborting