Page MenuHomePhabricator

tools cluster: pending linux kernel upgrades
Closed, DuplicatePublic

Description

During operations for T177920 we detected via unattended-upgrades several machines which are waiting a Linux kernel upgrade.

According to @chasemp :

All debian things are now sitting on a sleeper kernel update which has killed us in the past
Should kernel updates be blacklisted for unattended?
(update-initramfs: Generating /boot/initrd.img-4.4.0-1-amd64 vs linux-image-4.9.0-0.bpo.3-amd64 too ?)

Nodes which probably requires the upgrade:

  • tools-flannel-etcd-xx
  • tools-worker-xxxx
  • tools-static-xx

So, some decision should be made about this. Probably the options are:

  • left all kernel without upgrade, i.e. left things untouched, do nothing.
  • prevent unattended-upgrades from upgrading kernel, i.e. add blacklist
  • let unattended-upgrades upgrade the kernel but don't reboot nodes
  • be brave, let unattended-upgrades upgrade the kernel and do reboot the nodes

Event Timeline

aborrero created this task.Nov 17 2017, 5:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2017, 5:02 PM
aborrero renamed this task from Pending linux kernel upgrades in tools cluster to tools cluster: pending linux kernel upgrades.Nov 17 2017, 6:07 PM

We recently fought with https://phabricator.wikimedia.org/T182722 which involved rebooting workers. We had been sitting on pending kernel updates for Debian instances in T180809 because WMF unattended pulled in new kernels. At the moment the workers are sitting on 4.9.0-0.bpo.4-amd64 now and all other Debian instances in Tools are sitting on 4.4.0-3-amd64. Considering the historical virtio issues and the nightmare of debug I feel like this reinforces our strategy outlined in https://phabricator.wikimedia.org/T181647 to make make managing updates explicit and ongoing for Toolforge (and novaproxy or other WMCS managed resources).

root@tools-worker-1016:~# uname -a
Linux tools-worker-1016 4.9.0-0.bpo.4-amd64 #1 SMP Debian 4.9.51-1~bpo8+1 (2017-10-17) x86_64 GNU/Linux

root@tools-puppetmaster-01:/var/lib/git/operations/puppet# uname -a
Linux tools-puppetmaster-01 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22) x86_64 GNU/Linux

Andrew added a subscriber: Andrew.Jan 17 2018, 3:12 PM

This is probably resolved since everything was standardized as part of the meltdown fix.