Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | Marostegui | T422365 Migration to Debian Trixie of production database-related hosts | |||
| Resolved | Marostegui | T425388 Migrate s7 section to Debian Trixie | |||
| Resolved | Marostegui | T425506 db2208 PXE boot change not accessible | |||
| Resolved | Jhancock.wm | T425516 db2208 Backplane 0 error | |||
| Resolved | Marostegui | T426142 Switchover s7 master (db2220 -> db2218) | |||
| Resolved | Marostegui | T426088 Switchover s7 master (db1181 -> db1236) |
Event Timeline
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2208.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2208.codfw.wmnet with OS trixie completed:
- db2208 (WARN)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605070509_marostegui_918883_db2208.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1284330 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db2208: Enable notifications
Change #1284330 merged by Marostegui:
[operations/puppet@production] db2208: Enable notifications
Change #1284563 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1202,db2182: Disable notifications
Change #1284563 merged by Marostegui:
[operations/puppet@production] db1202,db2182: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1202.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2182.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1202.eqiad.wmnet with OS trixie completed:
- db1202 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605070914_marostegui_974609_db1202.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2182.codfw.wmnet with OS trixie completed:
- db2182 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605070918_marostegui_974708_db2182.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1284589 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1227,db2168: Disable notifications
Change #1284589 merged by Marostegui:
[operations/puppet@production] db1227,db2168: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1227.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2168.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1227.eqiad.wmnet with OS trixie completed:
- db1227 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605071048_marostegui_1048837_db1227.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2168.codfw.wmnet with OS trixie completed:
- db2168 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605071055_marostegui_1051628_db2168.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1285006 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db2159: Disable notifications
Change #1285006 merged by Marostegui:
[operations/puppet@production] db2159: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2159.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2159.codfw.wmnet with OS trixie completed:
- db2159 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605080551_marostegui_1282798_db2159.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1286168 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1231,db2150: Disable notifications
Change #1286168 merged by Marostegui:
[operations/puppet@production] db1231,db2150: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2150.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1231.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1231.eqiad.wmnet with OS trixie completed:
- db1231 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605120704_marostegui_2951101_db1231.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2150.codfw.wmnet with OS trixie completed:
- db2150 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605120708_marostegui_2951065_db2150.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1286735 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1253,db2218: Disable notifications
Change #1286735 merged by Marostegui:
[operations/puppet@production] db1253,db2218: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2218.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1253.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1253.eqiad.wmnet with OS trixie completed:
- db1253 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605130559_marostegui_3391454_db1253.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2218.codfw.wmnet with OS trixie completed:
- db2218 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605130603_marostegui_3391410_db2218.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1286845 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db2220: Disable notifications
Change #1286845 merged by Marostegui:
[operations/puppet@production] db2220: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2220.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2220.codfw.wmnet with OS trixie completed:
- db2220 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605131035_marostegui_3432323_db2220.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by fceratto@cumin1003 for host db1236.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by fceratto@cumin1003 for host db1236.eqiad.wmnet with OS trixie completed:
- db1236 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605131125_fceratto_3444723_db1236.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Starting pool of db1236 by fceratto@cumin1003: Migration of db1236.eqiad.wmnet completed
Completed pooling of db1236 by fceratto@cumin1003: Migration of db1236.eqiad.wmnet completed
Change #1287080 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1158: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1158.eqiad.wmnet with OS trixie
Change #1287080 merged by Marostegui:
[operations/puppet@production] db1158: Disable notifications
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1158.eqiad.wmnet with OS trixie completed:
- db1158 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605140529_marostegui_3741973_db1158.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1296249 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1181: Disable notifications
Change #1296249 merged by Marostegui:
[operations/puppet@production] db1181: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1181.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1181.eqiad.wmnet with OS trixie completed:
- db1181 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606020629_marostegui_3870080_db1181.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Starting pool of db1181 by marostegui@cumin1003: Migration of db1181.eqiad.wmnet completed
Completed pooling of db1181 by marostegui@cumin1003: Migration of db1181.eqiad.wmnet completed
Cookbook cookbooks.sre.hosts.reimage was started by fceratto@cumin1003 for host db1215.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by fceratto@cumin1003 for host db1215.eqiad.wmnet with OS trixie completed:
- db1215 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606100744_fceratto_2619557_db1215.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by fceratto@cumin1003 for host db1215.eqiad.wmnet with OS trixie executed with errors:
- db1215 (FAIL)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606100744_fceratto_2619557_db1215.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console db1215.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.