It's not showing up in grafana or in the alerts, but that node is stuck on nfs and has many processes stuck:
root@tools-k8s-worker-nfs-24:~# curl --silent http://127.0.0.1:9100/metrics | grep node_processes_state
# HELP node_processes_state Number of processes in each state.
# TYPE node_processes_state gauge
node_processes_state{state="D"} 31
node_processes_state{state="I"} 81
node_processes_state{state="R"} 1
node_processes_state{state="S"} 153I suspect this has been happening for a bit, and this to be the cause of the stuck pods during the upgrade (and a few of the current stuck ones)




