Page MenuHomePhabricator

Revert 5.10.70 from bullseye hosts
Closed, ResolvedPublic

Description

The update to 5.10.70 (shipped as part of the Bullseye 11.1 kernel update and buster-backports) caused an outage on cloudgw servers when upgraded to it and we also saw an issue with conntrack on mx2001 which caused firewall issues. Revert it for now by

  • uninstalling the 5.10.70 kernel packages on bullseye systems
  • rebooting affected systems back into 5.10.46
  • cloudmetrics1003.eqiad.wmnet (ignored, to be upgraded to 5.10.84 when it's out)
  • cloudmetrics1004.eqiad.wmnet (ignored, to be upgraded to 5.10.84 when it's out)
  • copernicium.wikimedia.org
  • db1124.eqiad.wmnet (ignored, these are only test hosts, to be upgraded to 5.10.84 when it's out)
  • db1125.eqiad.wmnet (ignored, these are only test hosts, to be upgraded to 5.10.84 when it's out)
  • db1128.eqiad.wmnet (ignored, these are only test hosts, to be upgraded to 5.10.84 when it's out)
  • failoid2002.codfw.wmnet
  • ganeti6001.drmrs.wmnet (not yet fully set up, to be upgraded to 5.10.84 when it's out)
  • ganeti6002.drmrs.wmnet (not yet fully set up, to be upgraded to 5.10.84 when it's out)
  • ganeti6003.drmrs.wmnet (not yet fully set up, to be upgraded to 5.10.84 when it's out)
  • ganeti6004.drmrs.wmnet (not yet fully set up, to be upgraded to 5.10.84 when it's out)
  • graphite1004.eqiad.wmnet
  • graphite2003.codfw.wmnet
  • ldap-replica1003.wikimedia.org
  • ldap-replica2005.wikimedia.org
  • ldap-replica2006.wikimedia.org
  • pc2014.codfw.wmnet (ignored, these are only test hosts, to be upgraded to 5.10.84 when it's out)
  • people2002.codfw.wmnet (inactive, to be upgraded to 5.10.84 when it's out)
  • prometheus2005.codfw.wmnet
  • prometheus2006.codfw.wmnet
  • puppetboard2002.codfw.wmnet (inactive, to be upgraded to 5.10.84 when it's out)
  • rpki1001.eqiad.wmnet
  • rpki2002.codfw.wmnet
  • sretest1002.eqiad.wmnet (ignored, just a test host, to be upgraded to 5.10.84 when it's out)

Event Timeline

I chatted with @MoritzMuehlenhoff re: the rollback, apt won't let you remove a running kernel though there's a way to ask grub to reboot into another menu entry (the second entry of the first submenu in this case). Therefore the procedure can look like this:

# grub-reboot '1>2'
# reboot
...
# apt remove linux-image-5.10.0-9-amd64
# grub-editenv /boot/grub/grubenv unset next_entry

Note linux-image-amd64 will also get removed, which is correct since it depends on -9- not -8-

MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff triaged this task as High priority.
MoritzMuehlenhoff updated the task description. (Show Details)

This is complete