- dbstore1007
- db2204
- db2189
- db2187
- db2175
- db2148
- db2138
- db2126
- db2125
- db2107 old master (will be decommissioned)
- db2104 (will be decommissioned)
- db2097
- db1246
- db1239 backup source T360751
- db1233
- db1229
- db1225
- db1222 candidate master
- db1197
- db1188
- db1182
- db1162 master
- db1156
- db1155
- clouddb1021
- clouddb1018
- clouddb1014
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T356960 Upgrade hosts to MariaDB 10.6 | |||
Resolved | Marostegui | T361543 Upgrade s2 to MariaDB 10.6 | |||
Resolved | Request | Jhancock.wm | T361779 decommission db2104.codfw.wmnet | ||
Duplicate | None | T361780 Switchover s2 master (db2107 -> db2204) | |||
Resolved | Marostegui | T362036 Switchover s2 master (db1162 -> db1222) |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2024-04-02T05:44:09Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1229 T361543', diff saved to https://phabricator.wikimedia.org/P59116 and previous config saved to /var/cache/conftool/dbconfig/20240402-054408-root.json
Change #1016080 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1229: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1229.eqiad.wmnet with OS bookworm
Change #1016080 merged by Marostegui:
[operations/puppet@production] db1229: Disable notifications
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1229.eqiad.wmnet with OS bookworm completed:
- db1229 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404020603_marostegui_198803_db1229.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1197.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1197.eqiad.wmnet with OS bookworm completed:
- db1197 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404020832_marostegui_221043_db1197.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Mentioned in SAL (#wikimedia-operations) [2024-04-02T12:04:57Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1188 T361543', diff saved to https://phabricator.wikimedia.org/P59158 and previous config saved to /var/cache/conftool/dbconfig/20240402-120455-root.json
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1188.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1188.eqiad.wmnet with OS bookworm completed:
- db1188 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404021222_marostegui_258065_db1188.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1016489 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1222: Upgrade to Bookworm and MariaDB 10.6
Mentioned in SAL (#wikimedia-operations) [2024-04-03T05:11:50Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1222 T361543', diff saved to https://phabricator.wikimedia.org/P59239 and previous config saved to /var/cache/conftool/dbconfig/20240403-051149-root.json
Change #1016489 merged by Marostegui:
[operations/puppet@production] db1222: Upgrade to Bookworm and MariaDB 10.6
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1222.eqiad.wmnet with OS bookworm
Mentioned in SAL (#wikimedia-operations) [2024-04-03T05:43:11Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2148 T361543', diff saved to https://phabricator.wikimedia.org/P59240 and previous config saved to /var/cache/conftool/dbconfig/20240403-054310-root.json
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2148.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1222.eqiad.wmnet with OS bookworm completed:
- db1222 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030528_marostegui_400270_db1222.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2148.codfw.wmnet with OS bookworm completed:
- db2148 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030604_marostegui_405823_db2148.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Mentioned in SAL (#wikimedia-operations) [2024-04-03T07:09:47Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2125 T361543', diff saved to https://phabricator.wikimedia.org/P59256 and previous config saved to /var/cache/conftool/dbconfig/20240403-070946-root.json
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2125.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2125.codfw.wmnet with OS bookworm completed:
- db2125 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030735_marostegui_418774_db2125.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2024-04-04T05:17:59Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2126 T361543', diff saved to https://phabricator.wikimedia.org/P59407 and previous config saved to /var/cache/conftool/dbconfig/20240404-051758-root.json
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2126.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2126.codfw.wmnet with OS bookworm completed:
- db2126 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404040539_marostegui_597454_db2126.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1017535 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1156: Migrate to MariaDB 10.6
Change #1017535 merged by Marostegui:
[operations/puppet@production] db1156: Migrate to MariaDB 10.6
Change #1017965 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1162: Disable notifications
Change #1017965 merged by Marostegui:
[operations/puppet@production] db1162: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1162.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1162.eqiad.wmnet with OS bookworm completed:
- db1162 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404090548_marostegui_1502589_db1162.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB