We learned a lot with the small bit of NFS changes made so far and in relation to T127561.
A few issue:
- mount points can go stale and puppet doesn't understand it making the share unavailable
- mount points can go stale causing any process that tries to access to fall into d-wait forever spiking load
- the above means even if no other process runs when puppet does it tries to manage the mount, freezes, and then slowly load creeps up eventually causing havoc not to mention puppet is now ineffectual
- puppet doesn't understand that only certain options are possible for remount putting NFS managed mounts in a weird state when we change one (an NFS specific option such as mode) that isn't. It causes fleeting failures for puppet runs and then inconsistent state for the mounts themselves.
- bugs in the NFS client(s) across multiple distros have caused bad and nondeterministic behavior when dealing with mounts during changes.
I am proposing a script that puppet can call out which does better with the above to manage the state of NFS mounts. If we change the param for the mounts to ensure => present, puppet will do the right thing in managing /etc/fstab but not managing the mount. I believe this is a prereq for T134896 among others as the long tail of NFS failures and load issues from the last round of changes took many days to track down and shake out.