Page MenuHomePhabricator

Upgrade x2 to Bullseye
Closed, ResolvedPublic

Description

Let's try to upgrade x2 to Bullseye before it starts to get used (if we've finished the tests at T295965 successfully)

Hosts:

  • db2144
  • db2143
  • db2142
  • db1153
  • db1152
  • db1151

Event Timeline

Marostegui created this task.
Marostegui moved this task from Triage to Blocked on the DBA board.
Marostegui added a subscriber: Krinkle.

@Krinkle do you have any ETA more or less on when x2 will be live? I want to know how much room we have to upgrade this (at least the masters).

Change 753707 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] x2 hosts: Disable notifications

https://gerrit.wikimedia.org/r/753707

Change 753707 merged by Marostegui:

[operations/puppet@production] x2 hosts: Disable notifications

https://gerrit.wikimedia.org/r/753707

(For the record, Manuel and I spoke on IRC. Given this would only take a few days, I said its fine to go ahead with this anytime in January. Unlikely to clash with our plans as we're delayed, and we can wait on the off-chance we're ready sooner.)

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2143.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2144.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2143.codfw.wmnet with OS bullseye completed:

  • db2143 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201170930_marostegui_21017_db2143.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2144.codfw.wmnet with OS bullseye completed:

  • db2144 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201170930_marostegui_21039_db2144.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1153.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1152.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1153.eqiad.wmnet with OS bullseye completed:

  • db1153 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201171015_marostegui_32451_db1153.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1152.eqiad.wmnet with OS bullseye completed:

  • db1152 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201171017_marostegui_32573_db1152.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2142.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2142.codfw.wmnet with OS bullseye completed:

  • db2142 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201171140_marostegui_16087_db2142.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1151.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1151.eqiad.wmnet with OS bullseye completed:

  • db1151 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201171219_marostegui_22694_db1151.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Marostegui updated the task description. (Show Details)

All x2 hosts are now running Bullseye:

$ sudo cumin "P{R:profile::mariadb::section = x2}" "lsb_release -a"
6 hosts will be targeted:
db[2142-2144].codfw.wmnet,db[1151-1153].eqiad.wmnet
Ok to proceed on 6 hosts? Enter the number of affected hosts to confirm or "q" to quit 6
===== NODE GROUP =====
(6) db[2142-2144].codfw.wmnet,db[1151-1153].eqiad.wmnet
----- OUTPUT of 'lsb_release -a' -----
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:	11
  - Host rebooted via IPMI
  - Host up (Debian installer)
Codename:	bullseye
No LSB modules are available.