Page MenuHomePhabricator

Upgrade s2 to MariaDB 10.6
Closed, ResolvedPublic

Description

  • dbstore1007
  • db2204
  • db2189
  • db2187
  • db2175
  • db2148
  • db2138
  • db2126
  • db2125
  • db2107 old master (will be decommissioned)
  • db2104 (will be decommissioned)
  • db2097
  • db1246
  • db1239 backup source T360751
  • db1233
  • db1229
  • db1225
  • db1222 candidate master
  • db1197
  • db1188
  • db1182
  • db1162 master
  • db1156
  • db1155
  • clouddb1021
  • clouddb1018
  • clouddb1014

Event Timeline

Marostegui triaged this task as Medium priority.Tue, Apr 2, 5:43 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2024-04-02T05:44:09Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1229 T361543', diff saved to https://phabricator.wikimedia.org/P59116 and previous config saved to /var/cache/conftool/dbconfig/20240402-054408-root.json

Change #1016080 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1229: Disable notifications

https://gerrit.wikimedia.org/r/1016080

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1229.eqiad.wmnet with OS bookworm

Change #1016080 merged by Marostegui:

[operations/puppet@production] db1229: Disable notifications

https://gerrit.wikimedia.org/r/1016080

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1229.eqiad.wmnet with OS bookworm completed:

  • db1229 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404020603_marostegui_198803_db1229.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1197.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1197.eqiad.wmnet with OS bookworm completed:

  • db1197 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404020832_marostegui_221043_db1197.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Mentioned in SAL (#wikimedia-operations) [2024-04-02T12:04:57Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1188 T361543', diff saved to https://phabricator.wikimedia.org/P59158 and previous config saved to /var/cache/conftool/dbconfig/20240402-120455-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1188.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1188.eqiad.wmnet with OS bookworm completed:

  • db1188 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404021222_marostegui_258065_db1188.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1016489 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1222: Upgrade to Bookworm and MariaDB 10.6

https://gerrit.wikimedia.org/r/1016489

Mentioned in SAL (#wikimedia-operations) [2024-04-03T05:11:50Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db1222 T361543', diff saved to https://phabricator.wikimedia.org/P59239 and previous config saved to /var/cache/conftool/dbconfig/20240403-051149-root.json

Change #1016489 merged by Marostegui:

[operations/puppet@production] db1222: Upgrade to Bookworm and MariaDB 10.6

https://gerrit.wikimedia.org/r/1016489

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1222.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-operations) [2024-04-03T05:43:11Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2148 T361543', diff saved to https://phabricator.wikimedia.org/P59240 and previous config saved to /var/cache/conftool/dbconfig/20240403-054310-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2148.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1222.eqiad.wmnet with OS bookworm completed:

  • db1222 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030528_marostegui_400270_db1222.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2148.codfw.wmnet with OS bookworm completed:

  • db2148 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030604_marostegui_405823_db2148.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Mentioned in SAL (#wikimedia-operations) [2024-04-03T07:09:47Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2125 T361543', diff saved to https://phabricator.wikimedia.org/P59256 and previous config saved to /var/cache/conftool/dbconfig/20240403-070946-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2125.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2125.codfw.wmnet with OS bookworm completed:

  • db2125 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030735_marostegui_418774_db2125.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-04-04T05:17:59Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2126 T361543', diff saved to https://phabricator.wikimedia.org/P59407 and previous config saved to /var/cache/conftool/dbconfig/20240404-051758-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2126.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2126.codfw.wmnet with OS bookworm completed:

  • db2126 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404040539_marostegui_597454_db2126.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1017535 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1156: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/1017535

Change #1017535 merged by Marostegui:

[operations/puppet@production] db1156: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/1017535

Change #1017965 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1162: Disable notifications

https://gerrit.wikimedia.org/r/1017965

Change #1017965 merged by Marostegui:

[operations/puppet@production] db1162: Disable notifications

https://gerrit.wikimedia.org/r/1017965

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1162.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1162.eqiad.wmnet with OS bookworm completed:

  • db1162 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404090548_marostegui_1502589_db1162.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Marostegui updated the task description. (Show Details)

All core hosts done.