It would be very helpful to have a DaemonSet or other mechanism which deployed a Pod on every Kubernetes worker node which then did some internal health checking of NFS mounts, NSS integration, and network routing from inside that node. This could be considered similar to our OpenStack full-stack checks or the "canary" instances we put on each hypervisor.
Ideally it would catch things like T242559: Partialy setup tools-k8s-worker instances created by novaadmin causing problems, automatically mark the node as unschedulable, and alert Toolforge admins of the problem.