Page MenuHomePhabricator

monitoring: Disk space check can fail to read fuse mounts
Closed, ResolvedPublic

Description

Recently we had a cloudvirt host that had a fuse mount as follows

$ sudo mount | grep /mnt                                                                                            
/dev/fuse on /mnt type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
$ sudo lsof /dev/fuse                                                                                                              
COMMAND       PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
guestmoun 1395696 root    3u   CHR 10,229      0t0  282 /dev/fuse
$ ps -ef | grep guestmoun                                                                                                       
root     1395696       1  2 Feb23 ?        02:37:10 guestmount -a /var/lib/nova/instances/7498566a-c160-4d7d-90eb-03470e5d80f3/disk.1677097630 -i --ro /mnt

As nrpe is ran as the nagios user it received DISK CRITICAL - /mnt is not accessible: Permission denied error

I think we should either skip fuse mounts from this check or update the nagios and nrpe checks so they can run as root. i think im leaning to the former as the better solution as i dont think we need to error if a fuse mount is running out of space but perhaps im missing something

Event Timeline

We can simply exclude the HDFS mount via profile::monitoring::nrpe_check_disk_options in Hiera, we already do that for the Hadoop cluster.

Change 905233 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] Ignore "fuse" mounts when checking disk space

https://gerrit.wikimedia.org/r/905233

fnegri changed the task status from Open to In Progress.Apr 3 2023, 2:19 PM
fnegri claimed this task.
fnegri triaged this task as Medium priority.

Change 905233 merged by FNegri:

[operations/puppet@production] Ignore "fuse" mounts when checking disk space

https://gerrit.wikimedia.org/r/905233