Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
mariadb: candidate master add for x1 | operations/puppet | production | +1 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T356960 Upgrade hosts to MariaDB 10.6 | |||
Resolved | ABran-WMF | T358642 Upgrade x1 to MariaDB 10.6 |
Event Timeline
Coordinate with @jcrespo for the backup sources - but we can leave those for a moment where more sections are migrated.
As I prepared beforehand for a previous upgrade, s6, x1 and s2 are already producing 10.6-compatible backups, and backup sources should be, at least partially, upgraded- we can just drop the 10.4 ones when fully upgraded and move 10.4 sections instead (I will handle that). So I should not be a blocker for this ticket. If you have a roadmap of future upgrades beyond those sections, I can start working on preparing those now, so I am ready already like in this case.
Thank you for taking me into account!
Mentioned in SAL (#wikimedia-operations) [2024-03-06T14:32:04Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Depool to reimage T358642', diff saved to https://phabricator.wikimedia.org/P58588 and previous config saved to /var/cache/conftool/dbconfig/20240306-143204-arnaudb.json
Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2131.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2131.codfw.wmnet with OS bookworm completed:
- db2131 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403061454_arnaudb_625899_db2131.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2024-03-06T15:21:35Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Depool to clone on db2131 T358642', diff saved to https://phabricator.wikimedia.org/P58589 and previous config saved to /var/cache/conftool/dbconfig/20240306-152130-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-03-07T10:10:05Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Depool to upgrade T358642', diff saved to https://phabricator.wikimedia.org/P58624 and previous config saved to /var/cache/conftool/dbconfig/20240307-101004-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-03-07T10:11:48Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 3:00:00 on db1220.eqiad.wmnet with reason: T358642
Mentioned in SAL (#wikimedia-operations) [2024-03-07T10:12:01Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1220.eqiad.wmnet with reason: T358642
Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db1220.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db1220.eqiad.wmnet with OS bookworm completed:
- db1220 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403071028_arnaudb_771962_db1220.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db1179.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db1179.eqiad.wmnet with OS bookworm completed:
- db1179 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403111356_arnaudb_483161_db1179.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 1010252 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/puppet@production] mariadb: candidate master add for x1
Change 1010252 merged by Arnaudb:
[operations/puppet@production] mariadb: candidate master add for x1
Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2115.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2115.codfw.wmnet with OS bookworm completed:
- db2115 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403140738_arnaudb_958365_db2115.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
remaining servers are backup sources, feel free to let me know if and when I can help!
I have created a task to track the backup sources on their own T360751: Upgrade backup sources to MariaDB 10.6 so I am going to close this as fixed.
Thanks for working on it!