Page MenuHomePhabricator

tools cluster: pending linux kernel upgrades
Closed, DuplicatePublic


During operations for T177920 we detected via unattended-upgrades several machines which are waiting a Linux kernel upgrade.

According to @chasemp :

All debian things are now sitting on a sleeper kernel update which has killed us in the past
Should kernel updates be blacklisted for unattended?
(update-initramfs: Generating /boot/initrd.img-4.4.0-1-amd64 vs linux-image-4.9.0-0.bpo.3-amd64 too ?)

Nodes which probably requires the upgrade:

  • tools-flannel-etcd-xx
  • tools-worker-xxxx
  • tools-static-xx

So, some decision should be made about this. Probably the options are:

  • left all kernel without upgrade, i.e. left things untouched, do nothing.
  • prevent unattended-upgrades from upgrading kernel, i.e. add blacklist
  • let unattended-upgrades upgrade the kernel but don't reboot nodes
  • be brave, let unattended-upgrades upgrade the kernel and do reboot the nodes

Event Timeline

aborrero renamed this task from Pending linux kernel upgrades in tools cluster to tools cluster: pending linux kernel upgrades.Nov 17 2017, 6:07 PM

We recently fought with which involved rebooting workers. We had been sitting on pending kernel updates for Debian instances in T180809 because WMF unattended pulled in new kernels. At the moment the workers are sitting on 4.9.0-0.bpo.4-amd64 now and all other Debian instances in Tools are sitting on 4.4.0-3-amd64. Considering the historical virtio issues and the nightmare of debug I feel like this reinforces our strategy outlined in to make make managing updates explicit and ongoing for Toolforge (and novaproxy or other WMCS managed resources).

root@tools-worker-1016:~# uname -a
Linux tools-worker-1016 4.9.0-0.bpo.4-amd64 #1 SMP Debian 4.9.51-1~bpo8+1 (2017-10-17) x86_64 GNU/Linux

root@tools-puppetmaster-01:/var/lib/git/operations/puppet# uname -a
Linux tools-puppetmaster-01 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22) x86_64 GNU/Linux

This is probably resolved since everything was standardized as part of the meltdown fix.