Page MenuHomePhabricator

Don't move filesystem_avail_bigger_than_size icinga check to alert manager?
Closed, ResolvedPublic

Description

XFS filesystems running on 4.9 kernels showed negative free space, the problem appeared to be a filesystem problem as both the nagios check and df showed the negative free space.

Most of the reports on the internet about negative free space on XFS seem to occur around the same time frame. Then a commit was made to 4.18 which fixes, xfs: don't trip over negative free space in xfs_reserve_blocks. I wasn't able to find any reports after this commit, so I think there is a decent chance this commit fixed it or another one did around the same time. Since we still have the workaround in place and since all buster and greater servers are running kernels newer than 4.18 my vote is to leave this check in icinga, util the swift nodes are upgraded off of stretch.

Event Timeline

Thank you for digging up the details/history for this! I'm +1 on leaving the check in icinga, and possibly behind a conditional based on the distribution

Thank you for digging up the details/history for this! I'm +1 on leaving the check in icinga, and possibly behind a conditional based on the distribution

Putting it behind a conditional based on Debian release sounds like a great idea, I will do that.

Change 766871 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] Restrict filesystem_avail_bigger_than_size check to Stretch

https://gerrit.wikimedia.org/r/766871

Change 766871 merged by JHathaway:

[operations/puppet@production] Restrict filesystem_avail_bigger_than_size check to Stretch

https://gerrit.wikimedia.org/r/766871

jhathaway claimed this task.