We occasionally see the following error on systems with HDFS. we should update wmf-auto-restart with the ability to ignore specific file systems
lsof: WARNING: can't stat() fuse.fuse_dfs file system /mnt/hdfs Output information may be incomplete.
We occasionally see the following error on systems with HDFS. we should update wmf-auto-restart with the ability to ignore specific file systems
lsof: WARNING: can't stat() fuse.fuse_dfs file system /mnt/hdfs Output information may be incomplete.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T294906 Puppet Improvements | |||
Duplicate | jbond | T265138 Work required to prepare for puppet 7 | |||
Resolved | SLyngshede-WMF | T273673 replace all puppet crons with systemd timers | |||
Open | None | T132324 Tracking and Reducing cron-spam to root@ | |||
Resolved | jbond | T217646 wmf-auto-restart occasionally errors on fuse mounts |
is that reproducible? Otherwise this might be caused by the stability issues we see with hdfs/fuse in general.
This is reproducible but not reliably, some file operation taking part on fuse e.g. ls -la /mnt/hdfs/tmp seem to cause lsof to fail. its is almost certainly to do with hdfs fuse stability issues. I think we could remove this noise with any of the following options
lsof -w
lsof -bw
lsof -e /mnt/hdfs
I would probably vote for lsof -bw as its simple to implement and i'm not sure we are that bothered about warnings with this tool
-w sounds good, but let's check first what kind of errors lsof potentially warns about, not that we miss something important in the future.
Another possible angle (if lsof supports that, didn't check) would be to exclude some directories entirely from scanning, the HDFS mount point doesn't contain and executables which might have a library reference, so omitting it in total is also a nice performance optimisation (the lsof runs take notably longer on e.g. the stat hosts compared to other production hosts).
the last option excludes mount points which would work for this case. As far as i can see you can only remove directories from the output which wouldn't stop the warning from triggering
also a very crude list of warnings
$ strings /usr/bin/lsof | grep -i warn %s: WARNING: can't stat() %s: WARNING: can't report offset; disregarding -o. %s: WARNING: can't report file flags; disregarding +f. %s: WARNING: unsupported format: %s %s: WARNING: can't stat( %s: WARNING: can't opendir( %s: WARNING: can't lstat( %s: WARNING: not a directory: %s: WARNING: no files found in directory: %s: WARNING: -S time (%d) changed to %d %s: WARNING -- child process %d may be hung. %s: WARNING: access %s: %s
Change 494764 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Add config file and exclude_mounts options to debdeploy
Change 494765 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Update wmf-auto-restarts to read exclude mounts from debdeploy config
Change 494764 merged by Jbond:
[operations/puppet@production] Add config file and exclude_mounts options to debdeploy
Change 494765 merged by Jbond:
[operations/puppet@production] Update wmf-auto-restarts to read exclude mounts from debdeploy config
Change 496223 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Exclude /mnt/hdfs from lsof operations in wmf-auto-restarts
Change 496223 merged by Jbond:
[operations/puppet@production] Exclude /mnt/hdfs from lsof operations in wmf-auto-restarts
Change 496405 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] Exclude /mnt/hdfs from lsof operations in wmf-auto-restarts
Change 496405 merged by Jbond:
[operations/puppet@production] Exclude /mnt/hdfs from lsof operations in wmf-auto-restarts