- dbstore1007 (T299481)
- db2149
- db2139 (backup T299876)
- db2127
- db2109
- db2105 (master)
- db2094 (sanitarium host)
- db2074 (sanitarium master)
- db1179
- db1175
- db1166
- db1157 (master)
- db1154 (sanitarium host)
- db1145 (backup T299876)
- db1123
- db1112 (sanitarium master)
- db1102 (backup T299876)
- clouddb1021 (T299480)
- clouddb1017 (T299480)
- clouddb1013 (T299480)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T291916 Tracking task for Bullseye migrations in production | |||
Resolved | Marostegui | T298585 Upgrade WMF database-and-backup-related hosts to bullseye | |||
Resolved | Marostegui | T300600 Upgrade s3 to Bullseye | |||
Resolved | Marostegui | T301850 Switchover s3 master (db1157 -> db1123) | |||
Resolved | • Cmjohnson | T304280 db1175 not booting up |
Event Timeline
Change 758781 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] s3 codfw db*: Disable notifications
Change 758781 merged by Marostegui:
[operations/puppet@production] s3 codfw db*: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2074.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2109.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2109.codfw.wmnet with OS bullseye completed:
- db2109 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010807_marostegui_15345_db2109.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2074.codfw.wmnet with OS bullseye completed:
- db2074 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010806_marostegui_15232_db2074.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2149.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2127.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2149.codfw.wmnet with OS bullseye completed:
- db2149 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010842_marostegui_24228_db2149.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2127.codfw.wmnet with OS bullseye completed:
- db2127 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010846_marostegui_24588_db2127.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 758797 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db2105: Disable notifications
Change 758797 merged by Marostegui:
[operations/puppet@production] db2105: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2105.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2105.codfw.wmnet with OS bullseye completed:
- db2105 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202011024_marostegui_14888_db2105.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 769278 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1123: Disable notifications
Change 769278 merged by Marostegui:
[operations/puppet@production] db1123: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1123.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1123.eqiad.wmnet with OS bullseye completed:
- db1123 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203090643_marostegui_2894_db1123.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-09T07:33:39Z] <marostegui> dbmaint on db1123 s3@eqiad T300600
Change 770831 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1166: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-15T06:26:12Z] <marostegui> dbmaint on s3@eqiad T300600
Change 770831 merged by Marostegui:
[operations/puppet@production] db1166: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1166.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1166.eqiad.wmnet with OS bullseye completed:
- db1166 (PASS)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203150628_marostegui_1166092_db1166.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change 771765 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1179: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-18T05:38:32Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1179 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2022-03-18T05:39:14Z] <marostegui> dbmaint on s3@eqiad T300600
Change 771765 merged by Marostegui:
[operations/puppet@production] db1179: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1179.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1179.eqiad.wmnet with OS bullseye completed:
- db1179 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203180542_marostegui_1863100_db1179.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-21T05:52:03Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1175 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22860 and previous config saved to /var/cache/conftool/dbconfig/20220321-055202-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2022-03-21T05:52:19Z] <marostegui> dbmaint s5@eqiad T300600
Change 772058 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1175: Disable notifications
Change 772058 merged by Marostegui:
[operations/puppet@production] db1175: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye executed with errors:
- db1175 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye
db1175 isn't booting up and the idrac doesn't show anything, created T304280 for eqiad's DC Ops.
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye executed with errors:
- db1175 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye completed:
- db1175 (WARN)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203220541_marostegui_2587373_db1175.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-22T12:24:49Z] <marostegui> dbmaint s3@eqiad T300600
Change 773120 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1112: Disable notifications
Change 773120 merged by Marostegui:
[operations/puppet@production] db1112: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1112.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1112.eqiad.wmnet with OS bullseye completed:
- db1112 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203230609_marostegui_2812305_db1112.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1157.eqiad.wmnet with OS bullseye
Old master was reimaged - this is all done (apart from the hosts that have their own task)
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1157.eqiad.wmnet with OS bullseye completed:
- db1157 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203290924_marostegui_4105547_db1157.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB