@dcaro found https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html that explains why a D state process results in increased loadavg on linux servers.
If some NFS hiccup (otherwise harmless) result in D state processes on the exec nodes, and the load avg goes up as a result, and if the grid schedules jobs based on grid load avg (just a theory at this point), then the failure mode is clear:
Any NFS hiccup (otherwise harmless) can result in the Grid becoming unavailable and/or unreliable.
We may consider creating a cookbook that scans the grid for D state procs and reboot affected nodes as an automated healing mechanism.