Page MenuHomePhabricator

Upgrade s3 to Bullseye
Closed, ResolvedPublic

Description

  • dbstore1007 (T299481)
  • db2149
  • db2139 (backup T299876)
  • db2127
  • db2109
  • db2105 (master)
  • db2094 (sanitarium host)
  • db2074 (sanitarium master)
  • db1179
  • db1175
  • db1166
  • db1157 (master)
  • db1154 (sanitarium host)
  • db1145 (backup T299876)
  • db1123
  • db1112 (sanitarium master)
  • db1102 (backup T299876)
  • clouddb1021 (T299480)
  • clouddb1017 (T299480)
  • clouddb1013 (T299480)

Event Timeline

Marostegui triaged this task as Medium priority.Feb 1 2022, 8:03 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Change 758781 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] s3 codfw db*: Disable notifications

https://gerrit.wikimedia.org/r/758781

Change 758781 merged by Marostegui:

[operations/puppet@production] s3 codfw db*: Disable notifications

https://gerrit.wikimedia.org/r/758781

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2074.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2109.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2109.codfw.wmnet with OS bullseye completed:

  • db2109 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010807_marostegui_15345_db2109.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2074.codfw.wmnet with OS bullseye completed:

  • db2074 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010806_marostegui_15232_db2074.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2149.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2127.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2149.codfw.wmnet with OS bullseye completed:

  • db2149 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010842_marostegui_24228_db2149.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2127.codfw.wmnet with OS bullseye completed:

  • db2127 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010846_marostegui_24588_db2127.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 758797 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2105: Disable notifications

https://gerrit.wikimedia.org/r/758797

Change 758797 merged by Marostegui:

[operations/puppet@production] db2105: Disable notifications

https://gerrit.wikimedia.org/r/758797

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2105.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2105.codfw.wmnet with OS bullseye completed:

  • db2105 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202011024_marostegui_14888_db2105.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

codfw core hosts are now upgraded.

Change 769278 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1123: Disable notifications

https://gerrit.wikimedia.org/r/769278

Change 769278 merged by Marostegui:

[operations/puppet@production] db1123: Disable notifications

https://gerrit.wikimedia.org/r/769278

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1123.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1123.eqiad.wmnet with OS bullseye completed:

  • db1123 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203090643_marostegui_2894_db1123.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-09T07:33:39Z] <marostegui> dbmaint on db1123 s3@eqiad T300600

Change 770831 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1166: Disable notifications

https://gerrit.wikimedia.org/r/770831

Change 770831 merged by Marostegui:

[operations/puppet@production] db1166: Disable notifications

https://gerrit.wikimedia.org/r/770831

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1166.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1166.eqiad.wmnet with OS bullseye completed:

  • db1166 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203150628_marostegui_1166092_db1166.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 771765 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1179: Disable notifications

https://gerrit.wikimedia.org/r/771765

Mentioned in SAL (#wikimedia-operations) [2022-03-18T05:38:32Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1179 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json

Change 771765 merged by Marostegui:

[operations/puppet@production] db1179: Disable notifications

https://gerrit.wikimedia.org/r/771765

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1179.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1179.eqiad.wmnet with OS bullseye completed:

  • db1179 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203180542_marostegui_1863100_db1179.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-21T05:52:03Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1175 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22860 and previous config saved to /var/cache/conftool/dbconfig/20220321-055202-marostegui.json

Change 772058 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1175: Disable notifications

https://gerrit.wikimedia.org/r/772058

Change 772058 merged by Marostegui:

[operations/puppet@production] db1175: Disable notifications

https://gerrit.wikimedia.org/r/772058

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye executed with errors:

  • db1175 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye

db1175 isn't booting up and the idrac doesn't show anything, created T304280 for eqiad's DC Ops.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye executed with errors:

  • db1175 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye

db1175 was reimaged fine after Chris fixed it on-site

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1175.eqiad.wmnet with OS bullseye completed:

  • db1175 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203220541_marostegui_2587373_db1175.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 773120 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1112: Disable notifications

https://gerrit.wikimedia.org/r/773120

Change 773120 merged by Marostegui:

[operations/puppet@production] db1112: Disable notifications

https://gerrit.wikimedia.org/r/773120

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1112.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1112.eqiad.wmnet with OS bullseye completed:

  • db1112 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203230609_marostegui_2812305_db1112.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

All these hosts are done - pending only the master swap T301850

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1157.eqiad.wmnet with OS bullseye

Marostegui updated the task description. (Show Details)

Old master was reimaged - this is all done (apart from the hosts that have their own task)

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1157.eqiad.wmnet with OS bullseye completed:

  • db1157 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203290924_marostegui_4105547_db1157.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB