Page MenuHomePhabricator

paws-master-01 high load and NFS client issues
Closed, ResolvedPublic

Description

Somehow, the paws k8s master lost contact with NFS from labstore1004 on March 22nd and cannot remount the home or project dirs.

So far, it looks like a reboot is the only way. Puppet has been broken at least partially since that time.

Event Timeline

Bstorm triaged this task as High priority.Mar 27 2019, 9:51 PM
Bstorm created this task.

Mar 22 14:04:17 tools-paws-master-01 kernel: [2170172.772597] nfs: server nfs-tools-project.svc.eqiad.wmnet not responding, timed out

Since then, puppet is unable to mount. Root cannot do it either.

I was able to umount things, but something is blocking mount (by something, I mean some containers most likely). Load is currently at 12. It appears to be riding high in general, but it goes up and down.

Mentioned in SAL (#wikimedia-cloud) [2019-03-27T22:10:04Z] <chicocvenancio> moving paws host in paws-proxy-02 to tools-paws-worker-1005 T219460

Mentioned in SAL (#wikimedia-cloud) [2019-03-27T23:35:56Z] <bstorm_> rebooted tools-paws-master-01 for NFS issue T219460

That resolved the issues. PAWS seems healthy now.

Mentioned in SAL (#wikimedia-cloud) [2019-03-27T23:46:13Z] <chicocvenancio> moving paws host in paws-proxy-02 back to tools-paws-master-01 T219460

Bstorm closed this task as Resolved.Mar 27 2019, 11:47 PM
Bstorm claimed this task.