Page MenuHomePhabricator

Figure out how NFS failovers will work for the dumps servers - labstore1006|7
Closed, ResolvedPublic


We can't do a cluster IP based failover in this setup, which we can do in labstore-secondary because drbd makes the volumes look the exact same. What are the alternatives to be able to fail over NFS clients between the two boxes for maint/outages?

Event Timeline

Change 403767 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] WIP: nfsclient: Setup dumps mounts from new servers

Here's a draft of the failover plan for the dumps distribution servers:


  • Have the dumps shares exported by both servers (labstore1006 & 7) mounted on all instances that need it. Would like like: on /mnt/nfs/labstore1007-dumps type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=,local_lock=none,addr= on /mnt/nfs/labstore1006-dumps type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=,local_lock=none,addr=
  • Symlink the desginated NFS distribution server (1006 or 7) to /public/dumps. This would end up looking like:
root@newdumps-test:/# ls -al /public/
total 12
drwxr-xr-x  3 root root 4096 Jan 11 22:09 .
drwxr-xr-x 23 root root 4096 Jan 10 06:48 ..
lrwxrwxrwx  1 root root   27 Jan 11 22:09 dumps -> /mnt/nfs/labstore1006-dumps

Failover and fail back plan:

Assuming we are failing over from 1006 to 1007

  • Switch symlink target for /public/dumps to /mnt/nfs/labstore1007-dumps in puppet (nfsclient.pp)
  • Roll out puppet change to all instances. This will have just switched the symlinks. Open filehandles to the labstore1006-dumps will continue to read from it.
  • Lazy unmount /mnt/nfs/labstore1006-dumps across all instances. We can do this using nfs-mount-manager umount /mnt/nfs/labstore1007-dumps, and will detach the filesystem from the file hierarchy right away, and clean up all references to this filesystem as soon as it is not busy anymore.
  • At this point we can wait for a while for current open connections to die, or proceed if we don't have a choice. If the backend nfs server goes away, our lazy unmount will ensure that the instance itself still behaves fine and is responsive to commands like df and lsof, but the processes that are reading from the old location may go into uninterruptible sleep (D state) and may need to be cleaned up. Hopefully we can wait for the connections to close in most planned maintenance cases.
  • The next step would be to service nfs-kernel-server stop on labstore1006.
  • Perform maintenance/reboot
  • Bring back nfs-kernel-server, make sure all the shares are exporting okay
  • Run puppet across instances to ensure that the labstore1006-dumps mount is mounted again. nfs-mount-manager check /mnt/nfs/labstore1006-dumps to verify.
  • To fail back, switch symlinks back to the labstore1006 mount in nfsclient.pp, run puppet to apply across instances.

Small note: we talked yesterday about the failover plan and practicalities. Let's codify something either baked into nfs-mount-manager or as an external tool to kill any process accessing a particular mount path for the case where gracefulness will not be possible. Hopefully that's a small percentage of the time but definitely non-0 and we really want to have thought through how to prevent a scattering of d wait process across exec/workers before the moment arrives :)

Change 408864 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] nfs-mount-manager: Add option to kill process accessing a mount

Change 408864 merged by Madhuvishy:
[operations/puppet@production] nfs-mount-manager: Add option to kill process accessing a mount

madhuvishy claimed this task.
madhuvishy added a subscriber: Ottomata.

Chatted with @Ottomata today in #wikimedia-analytics, and we decided to use a similar strategy for the stat/notebook mounts. We'll mount shares from labstore1006/7 in /mnt, and symlink the active NFS one to /mnt/data (which is the current access point for stat users).

Resolving this since we know the strategy now, patches for set up coming in T188643 and T188644