Page MenuHomePhabricator

tools-k8s-haproxy-2 NFS problems
Closed, ResolvedPublic

Description

<Krenair> is there a problem with the disks / mounts on tools-k8s-haproxy-2 ?
<Krenair> just noticed df returned IO errors for NFS mounts when I ran it across a load of instances earlier, and it's hanging when I log in and run it myself
<Krenair> not sure why this instance even has NFS enabled if it's just haproxy
<Krenair> Jan  5 00:14:20 tools-k8s-haproxy-2 kernel: [5055850.525005] nfs: server labstore1006.wikimedia.org not responding, timed out
<Krenair> Jan  5 00:14:20 tools-k8s-haproxy-2 kernel: [5055850.525013] nfs: server cloudstore1009.wikimedia.org not responding, timed out
<Krenair> instance has existed since november so should have an IP that's allowed to connect
<bd808> probably not rebooted since the last round of nfs server outages
<bd808> it probably does not need nfs. I don't remember how we mark things in tools as not needing nfs but I know its possible
<Krenair> shall I leave a ticket for someone to take a look?
<bd808> sure. I bet j.ason or a.rturo can fix it up

I'd deal with it but I'm worried about breaking something, especially at this time. These can probably have NFS disabled?

Event Timeline

Krenair created this task.Sun, Jan 5, 12:28 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSun, Jan 5, 12:28 AM

Maybe we just need to add mount_nfs: false to a tools-k8s-haproxy puppet prefix in horizon:

alex@alex-laptop:~/Development/Wikimedia/instance-puppet/tools (master)$ git grep -i mount_nfs:
tools-docker-builder.yaml:mount_nfs: false
tools-elastic.yaml:mount_nfs: false
tools-flannel-etcd.yaml:mount_nfs: false
tools-k8s-etcd.yaml:mount_nfs: false
tools-logs.yaml:mount_nfs: false
tools-prometheus.yaml:mount_nfs: false
tools-proxy.yaml:mount_nfs: false
tools-puppetmaster-01.tools.eqiad.wmflabs.yaml:mount_nfs: false
tools-redis.yaml:mount_nfs: false
alex@alex-laptop:~/Development/Wikimedia/instance-puppet/tools (master)$
Peachey88 updated the task description. (Show Details)Sun, Jan 5, 1:28 AM
Bstorm added a subscriber: Bstorm.Mon, Jan 6, 3:29 AM

I'm pretty sure these just need a reboot. They likely weren't rebooted after that openstack upgrade thing a while back when most things NFS connected ended up in that exact condition for those exact servers. That said, I like the idea of them NOT mounting NFS. They don't need it, and therefore shouldn't have it.

Overall, they will likely require editing the fstab and/or rebooting anyway even after the hiera change. I do think they should have that hiera change, though.

Mentioned in SAL (#wikimedia-cloud) [2020-01-06T18:47:01Z] <bstorm_> added mount_nfs=false to tools-k8s-haproxy puppet prefix T241908

Mentioned in SAL (#wikimedia-cloud) [2020-01-06T18:49:09Z] <bstorm_> edited /etc/fstab to remove NFS and rebooted to clear stale mounts on tools-k8s-haproxy-2 T241908

Mentioned in SAL (#wikimedia-cloud) [2020-01-06T18:54:44Z] <bstorm_> edited /etc/fstab to remove NFS and unmounted the nfs volumes tools-k8s-haproxy-1 T241908

Bstorm closed this task as Resolved.Mon, Jan 6, 6:56 PM
Bstorm claimed this task.