Page MenuHomePhabricator

Upgrade sessionstore to bullseye
Closed, ResolvedPublic

Description

Upgrade sessionstore cluster to Debian Bullseye.

  • sessionstore2001
  • sessionstore2002
  • sessionstore2003
  • sessionstore1001
  • sessionstore1002
  • sessionstore1003

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2023-06-21T15:03:36Z] <urandom> depooling sessionstore/codfw — T340043

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1001 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1001 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Mentioned in SAL (#wikimedia-operations) [2023-06-26T18:33:07Z] <urandom> depooling sessionstore/codfw for bullseye upgrades — T340043

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2002.codfw.wmnet with OS bullseye completed:

  • sessionstore2002 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306261902_eevans_3171571_sessionstore2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2001.codfw.wmnet with OS bullseye completed:

  • sessionstore2001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306262018_eevans_3254942_sessionstore2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2003.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2003.codfw.wmnet with OS bullseye executed with errors:

  • sessionstore2003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin2002 for host sessionstore2003.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin2002 for host sessionstore2003.codfw.wmnet with OS bullseye completed:

  • sessionstore2003 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306262122_eevans_3320583_sessionstore2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2023-06-26T21:53:15Z] <urandom> pooling sessionstore/codfw for bullseye upgrades — T340043

Eevans triaged this task as Medium priority.Jun 26 2023, 10:00 PM
Eevans updated the task description. (Show Details)

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1001 for host sessionstore1001.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1001 for host sessionstore1001.eqiad.wmnet with OS bullseye completed:

  • sessionstore1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306281532_eevans_903476_sessionstore1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1001 for host sessionstore1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1001 for host sessionstore1002.eqiad.wmnet with OS bullseye completed:

  • sessionstore1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306281610_eevans_913838_sessionstore1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1001 for host sessionstore1003.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1001 for host sessionstore1003.eqiad.wmnet with OS bullseye completed:

  • sessionstore1003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306281648_eevans_922118_sessionstore1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Eevans claimed this task.
Eevans updated the task description. (Show Details)

macro-deployed