Page MenuHomePhabricator

Upgrade m1 to Bullseye
Closed, ResolvedPublic

Description

Let's upgrade m1 to Bullseye.
This will have our first master (apart from parsercache) running Bullseye.

Hosts:

  • db1159 (master)
  • db1117
  • db2132 (codfw master)
  • db2078

Let's use db1128 (once the s1 tests are done) as a master to failover to.

  • Move db1128 to m1 as a future master
  • Switchover m1 master, from db1159 to db1128 T299624

Event Timeline

Marostegui triaged this task as Medium priority.Jan 17 2022, 1:01 PM
Marostegui moved this task from Triage to Ready on the DBA board.

@Kormat heads up, I am planning to switch this master probably next week if all goes fine. In case you want to work on db-switchover to test that it doesn't upgrade tendril anymore (which is something we need to do before we can stop mysql on tendril's host: T297605)

Change 754511 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2132: Disable notifications

https://gerrit.wikimedia.org/r/754511

Mentioned in SAL (#wikimedia-operations) [2022-01-17T14:15:37Z] <marostegui> Reimage db2132 to Bullseye T299344

Change 754511 merged by Marostegui:

[operations/puppet@production] db2132: Disable notifications

https://gerrit.wikimedia.org/r/754511

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2132.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2132.codfw.wmnet with OS bullseye completed:

  • db2132 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201171416_marostegui_2982_db2132.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

I will use db1128 as a floating host. It needs to be removed from s1 once the successful live MW tests are done, which I expect to be during this week.

Change 754878 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1117: Disable notifications

https://gerrit.wikimedia.org/r/754878

Change 754878 merged by Marostegui:

[operations/puppet@production] db1117: Disable notifications

https://gerrit.wikimedia.org/r/754878

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1117.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1117.eqiad.wmnet with OS bullseye completed:

  • db1117 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201181031_marostegui_3001_db1117.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 755353 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] instances.yaml: Remove db1128 from dbctl

https://gerrit.wikimedia.org/r/755353

Change 755353 merged by Marostegui:

[operations/puppet@production] instances.yaml: Remove db1128 from dbctl

https://gerrit.wikimedia.org/r/755353

Mentioned in SAL (#wikimedia-operations) [2022-01-19T12:56:59Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db1128 from dbctl T299344', diff saved to https://phabricator.wikimedia.org/P18858 and previous config saved to /var/cache/conftool/dbconfig/20220119-125658-marostegui.json

db1128 removed from dbctl (it was serving in s1) and ready to be moved to m1.

Change 755526 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Move db1128 to m1

https://gerrit.wikimedia.org/r/755526

Change 755526 merged by Marostegui:

[operations/puppet@production] mariadb: Move db1128 to m1

https://gerrit.wikimedia.org/r/755526

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1128.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-operations) [2022-01-20T07:57:14Z] <marostegui> Stop mysql on db1117 to clone db1128 T299344

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1128.eqiad.wmnet with OS bullseye completed:

  • db1128 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201200732_marostegui_10495_db1128.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 755530 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Do not format db1128

https://gerrit.wikimedia.org/r/755530

Change 755530 merged by Marostegui:

[operations/puppet@production] install_server: Do not format db1128

https://gerrit.wikimedia.org/r/755530

db1128 is now replicating in m1 with Bullseye.

All done. db1159 will be moved to m2: T300243