When investigating intermittent issues with the admin Tool I noticed that the k8s node it is running on had a load spike that coincided:
Load is a pretty well known proxy for instance local NFS overload and so I tried to pull up the client NFS stats in graphite and noticed they are missing.
I looked on an older host and have to go back nearly a year to see actual data :)
We have a collector that would push stats per NFS backend?
modules/diamond/files/collector/nfsiostat.py
This collector is a mess in that I was first attempting to convert (nfs-common: /usr/sbin/nfsiostat to a Diamond collector under duress and discovered some weirdness in how nfsiostat translates ongoing stats (i.e. not the same as say iostat). There was a time when this was a needed daily because the backend NFS setup was melting.
Based on the timeline I /think/ this dates back to a change in how NFS mounts are presented to clients.
It used to be that /data/project and other NFS mounts were the actual mount point and now we use symlinks and mount under /mnt/nfs:
/data/project -> /mnt/nfs/labstore-secondary-tools-project