Page MenuHomePhabricator

Swift / puppet interaction can fill up root filesystem
Closed, ResolvedPublic

Description

ATM it might happen that on a swift backend host the root filesystem gets filled up:

  1. swift-drive-audit scans kernel messages and determine a disk has failed or is failing
  2. the filesystem is unmounted, although swift fails to notice in time and/or there is already an rsync running for that filesystem
  3. puppet is instructed to grant write access to the mountpoint to swift user, whether the filesystem is mounted or not

Event Timeline

Change 516615 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: change ownership depending on mountpoint status

https://gerrit.wikimedia.org/r/516615

One thing that can happen with this changeset, in the case where a disk bounces like that repeatedly, is that before each puppet run chowns the directory to root, some additional data is copied to the local filesystem. After a few cycles that can fill / and we'll be back here again.

Can we get alerts on flapping disks rather, and keep them from being remounted if they have errors? Or can we do that in addition to the proposed patch?

The remounting is something we need to investigate why for sure, although related I think there are two independent issues. To be clear the intent of the patch above is to address a disk being e.g. waiting to be replaced and making sure the correct permissions are in place. IOW we'd have this issue even if remounting (by systemd I believe) wasn't happening

Yes, I agree. The patch is necessary, just not sufficient :-)

Change 516615 merged by Filippo Giunchedi:
[operations/puppet@production] swift: change ownership depending on mountpoint status

https://gerrit.wikimedia.org/r/516615

fgiunchedi claimed this task.

This is fixed, permissions will be set depending on whether the directory is a mountpoint or not. Unrelated but distinct issue is the interaction between drive-audit and systemd in T265450: flip/flop mounting filesystems between systemd and swift-drive-audit