Page MenuHomePhabricator

Remove overlay from kernel blacklist on toolforge
Closed, ResolvedPublic

Description

The paws cluster uses the upstream recommended storage driver for docker (overlay2), which requires the overlay filesystem to be enabled on the node. The overlay kernel module is blacklisted in puppet however, and seems to be inconsistently applied - loadable sometimes and not others. This has caused several outages already.

Would be great if it could be removed from the blacklist on toolforge.

https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/
https://www.coshx.com/blog/2016/06/16/wheres-my-overlay/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 2 2018, 10:39 PM

tools-paws-worker-1019 is currently in a broken state due to this (journalctl -u docker)

The module was initially blacklisted since there were multiple security issues which exploited privilege escalation bugs in overlayfs. Since then trusty has gained support for disabling unprivileged user namepaces (which was enabled), which was the biggest risk. I'm fine with adding a Hiera setting to disable the blacklist for Docker hosts. For the rest of the fleet we don't have a use for it and should keep it blacklisted.

Change 402068 had a related patch set uploaded (by Rush; owner: cpettet):
[operations/puppet@production] tools: need overlay module for overlay2 for k8s

https://gerrit.wikimedia.org/r/402068

Details:

  • modules/base/manifests/kernel.pp blacklists 'overlay' and 'overlayfs' (name changed)
  • results in /etc/modprobe.d/blacklist-wmf.conf

To temporarily fix up tools-paws-master-01.tools.eqiad.wmflabs which was hosed by this (for some reason?...possibly never rebooted post install until the break?):

  • edited /etc/modprobe.d/blacklist-wmf.conf
  • insmod overlay
  • echo "overlay" > /etc/modules-load.d/overlay.conf
  • /sbin/reboot
  • lsmod | grep overlay
chasemp triaged this task as High priority.Jan 4 2018, 3:37 PM
chasemp updated the task description. (Show Details)

Change 402075 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] kmod blacklist: allow ensure => absent for a given blacklist

https://gerrit.wikimedia.org/r/402075

Change 402075 merged by Andrew Bogott:
[operations/puppet@production] kmod blacklist: allow ensure => absent for a given blacklist

https://gerrit.wikimedia.org/r/402075

Change 402068 merged by Andrew Bogott:
[operations/puppet@production] tools: need overlay module for overlay2 for k8s

https://gerrit.wikimedia.org/r/402068

Andrew added a comment.Jan 4 2018, 5:16 PM
-blacklist overlay
-install overlay /bin/true
-blacklist overlayfs
-install overlayfs /bin/true

applied on all tools VMs.

Would be good for someone to verify that this has the desired effect.

Mentioned in SAL (#wikimedia-cloud) [2018-01-04T17:24:23Z] <andrewbogott> rebooting tools-paws-worker-1019 to verify repair of T184018

Andrew closed this task as Resolved.Jan 4 2018, 5:29 PM
Andrew claimed this task.

After a reboot:

root@tools-paws-worker-1019:~# lsmod | grep overlay
overlay                49152  7

@yuvipanda can you do a sanity check here whenever you have a minute?