Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T291916 Tracking task for Bullseye migrations in production | |||
Resolved | Marostegui | T298585 Upgrade WMF database-and-backup-related hosts to bullseye | |||
Resolved | Marostegui | T300099 Upgrade x1 to Bullseye | |||
Resolved | Marostegui | T300472 Switchover x1 master (db1103 -> db1120) |
Event Timeline
Change 757288 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] mariadb: x1 codfw disable notifications
Change 757288 merged by Marostegui:
[operations/puppet@production] mariadb: x1 codfw disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2096.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2115.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2096.codfw.wmnet with OS bullseye completed:
- db2096 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260641_marostegui_11389_db2096.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2131.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2115.codfw.wmnet with OS bullseye completed:
- db2115 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260643_marostegui_11573_db2115.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2131.codfw.wmnet with OS bullseye completed:
- db2131 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260714_marostegui_19960_db2131.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-01-26T08:57:33Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1120 T300099', diff saved to https://phabricator.wikimedia.org/P19249 and previous config saved to /var/cache/conftool/dbconfig/20220126-085733-marostegui.json
Change 757386 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1120: Disable notifications
Change 757386 merged by Marostegui:
[operations/puppet@production] db1120: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1120.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1120.eqiad.wmnet with OS bullseye completed:
- db1120 (PASS)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260859_marostegui_3666_db1120.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-01-26T11:42:37Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1137 T300099', diff saved to https://phabricator.wikimedia.org/P19286 and previous config saved to /var/cache/conftool/dbconfig/20220126-114236-marostegui.json
Change 757419 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1137: Disable notifications
Change 757419 merged by Marostegui:
[operations/puppet@production] db1137: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1137.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1137.eqiad.wmnet with OS bullseye completed:
- db1137 (PASS)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201261144_marostegui_31461_db1137.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change 809878 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1103: Disable notifications
Change 809878 merged by Marostegui:
[operations/puppet@production] db1103: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1103.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1103.eqiad.wmnet with OS bullseye completed:
- db1103 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206300625_marostegui_940723_db1103.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB