Page MenuHomePhabricator

alert hosts short of root disk space
Closed, ResolvedPublic

Description

I noticed today an alert on alert hosts re: / at warning level, the investigation proceeded as follows:

  • find out that /var/lib/docker is using a ton of space
  • issue a docker system prune to clear out old images
  • in the meantime observe that the reason there are massive data and metadata files is because those are used for devicemapper storage backend
  • our docker::engine profile checks if you have enabled overlayfs (default off) or not, and issues a warning()
  • the warning is logged server side only, so we never noticed that you are not supposed to run docker::engine without also setting profile::base::overlayfs: true

Event Timeline

Change 903238 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: move alerting_host to overlay2

https://gerrit.wikimedia.org/r/903238

Change 903238 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: move alerting_host to overlay2

https://gerrit.wikimedia.org/r/903238

fgiunchedi claimed this task.

Hosts are on overlay2 now

Same problem with overlay2, the root cause is actually accumulating docker images that never get pruned

Change 946511 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: prune old docker images

https://gerrit.wikimedia.org/r/946511

fgiunchedi renamed this task from alert hosts short of root disk space / docker devicemapper vs overlayfs to alert hosts short of root disk space.Aug 7 2023, 7:58 AM

Change 946511 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: prune old docker images

https://gerrit.wikimedia.org/r/946511

Mentioned in SAL (#wikimedia-operations) [2023-08-07T08:08:39Z] <godog> start docker-image-prune-old on alert hosts - T329939

Cleanup will happen on a weekly basis, which is going to keep the disk usage under control (we're back at ~50% used)