Somehow, the paws k8s master lost contact with NFS from labstore1004 on March 22nd and cannot remount the home or project dirs.
So far, it looks like a reboot is the only way. Puppet has been broken at least partially since that time.
Somehow, the paws k8s master lost contact with NFS from labstore1004 on March 22nd and cannot remount the home or project dirs.
So far, it looks like a reboot is the only way. Puppet has been broken at least partially since that time.
Mar 22 14:04:17 tools-paws-master-01 kernel: [2170172.772597] nfs: server nfs-tools-project.svc.eqiad.wmnet not responding, timed out
Since then, puppet is unable to mount. Root cannot do it either.
I was able to umount things, but something is blocking mount (by something, I mean some containers most likely). Load is currently at 12. It appears to be riding high in general, but it goes up and down.
Mentioned in SAL (#wikimedia-cloud) [2019-03-27T22:10:04Z] <chicocvenancio> moving paws host in paws-proxy-02 to tools-paws-worker-1005 T219460
Mentioned in SAL (#wikimedia-cloud) [2019-03-27T23:35:56Z] <bstorm_> rebooted tools-paws-master-01 for NFS issue T219460
Mentioned in SAL (#wikimedia-cloud) [2019-03-27T23:46:13Z] <chicocvenancio> moving paws host in paws-proxy-02 back to tools-paws-master-01 T219460