Page MenuHomePhabricator

Migrate m2 to Debian Bookworm + MariaDB 10.6
Closed, ResolvedPublic

Description

  • db1195 master
  • db1217
  • db2133
  • db2160

db1119 needs to be moved from m1 to m2, so the master can be reimaged.

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

Change 975117 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Move db1119 to m2

https://gerrit.wikimedia.org/r/975117

Change 975117 merged by Marostegui:

[operations/puppet@production] mariadb: Move db1119 to m2

https://gerrit.wikimedia.org/r/975117

Change 975118 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2133: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/975118

Change 975118 merged by Marostegui:

[operations/puppet@production] db2133: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/975118

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2133.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2133.codfw.wmnet with OS bookworm completed:

  • db2133 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311170716_marostegui_2299186_db2133.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 975742 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2160: Remove package declaration

https://gerrit.wikimedia.org/r/975742

Change 975742 merged by Marostegui:

[operations/puppet@production] db2160: Remove package declaration

https://gerrit.wikimedia.org/r/975742

Change 975798 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2133,db1217: Remove package declaracion

https://gerrit.wikimedia.org/r/975798

Change 975798 merged by Marostegui:

[operations/puppet@production] db2133,db1217: Remove package declaracion

https://gerrit.wikimedia.org/r/975798

Change 976885 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1195: Disable notifications

https://gerrit.wikimedia.org/r/976885

Change 976885 merged by Marostegui:

[operations/puppet@production] db1195: Disable notifications

https://gerrit.wikimedia.org/r/976885

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1195.eqiad.wmnet with OS bookworm

This is done, pending is the switch back to db1195 as master which is tracked at T351863: Switchover m2 master db1119 -> db1195. I will do it once db1195 has been running for a few days with bookworm and 10.6, to make sure it is stable.

Marostegui updated the task description. (Show Details)

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1195.eqiad.wmnet with OS bookworm completed:

  • db1195 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311230708_marostegui_1911882_db1195.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)