Page MenuHomePhabricator

Upgrade es4 to Bullseye
Closed, ResolvedPublic

Description

Let's upgrade es4 to Bullseye.
es4 is RW not RO, so it does require a proper DB switchover.

  • es2022
  • es2021
  • es2020
  • es1022
  • es1021 (master)
  • es1020

Event Timeline

Marostegui triaged this task as Medium priority.Jan 25 2022, 9:53 AM
Marostegui moved this task from Triage to Ready on the DBA board.

Change 756952 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2020: Disable notifications

https://gerrit.wikimedia.org/r/756952

Change 756952 merged by Marostegui:

[operations/puppet@production] es2020: Disable notifications

https://gerrit.wikimedia.org/r/756952

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es2020.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es2020.codfw.wmnet with OS bullseye completed:

  • es2020 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251002_marostegui_9575_es2020.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 756961 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2021: Disable notifications

https://gerrit.wikimedia.org/r/756961

Change 756961 merged by Marostegui:

[operations/puppet@production] es2021: Disable notifications

https://gerrit.wikimedia.org/r/756961

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es2021.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es2021.codfw.wmnet with OS bullseye completed:

  • es2021 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251055_marostegui_28151_es2021.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-01-26T06:31:50Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool es1020 T300005', diff saved to https://phabricator.wikimedia.org/P19231 and previous config saved to /var/cache/conftool/dbconfig/20220126-063149-marostegui.json

Change 757287 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1020: Disable notifications

https://gerrit.wikimedia.org/r/757287

Change 757287 merged by Marostegui:

[operations/puppet@production] es1020: Disable notifications

https://gerrit.wikimedia.org/r/757287

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es1020.eqiad.wmnet with OS bullseye executed with errors:

  • es1020 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

es1020 might be having the same issues es1022 had: T299123

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es1020.eqiad.wmnet with OS bullseye completed:

  • es1020 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201261015_marostegui_25345_es1020.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

On Wednesday around 9AM UTC I will disable writes to es4 so I can do a switchover and then reimage the current master.

Change 759207 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1021: Disable notifications

https://gerrit.wikimedia.org/r/759207

Change 759207 merged by Marostegui:

[operations/puppet@production] es1021: Disable notifications

https://gerrit.wikimedia.org/r/759207

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es1021.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es1021.eqiad.wmnet with OS bullseye completed:

  • es1021 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202020939_marostegui_14176_es1021.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

This is all done, es1021 is being repooled slowly.

(6) es[2020-2022].codfw.wmnet,es[1020-1022].eqiad.wmnet
----- OUTPUT of 'lsb_release -a' -----
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye
No LSB modules are available.