We can't do a cluster IP based failover in this setup, which we can do in labstore-secondary because drbd makes the volumes look the exact same. What are the alternatives to be able to fail over NFS clients between the two boxes for maint/outages?
|operations/puppet||production||+16 -1||nfs-mount-manager: Add option to kill process accessing a mount|
|Resolved||bd808||T166402 Program 7 Outcome 3: data services|
|Resolved||ArielGlenn||T182540 get datset1001, ms1001 ready for decommission|
|Resolved||• madhuvishy||T168486 Migrate customer-facing Dumps endpoints to Cloud Services|
|Resolved||• madhuvishy||T181431 Setup NFS on dumps servers|
|Resolved||• madhuvishy||T171540 Figure out how NFS failovers will work for the dumps servers - labstore1006|7|
Here's a draft of the failover plan for the dumps distribution servers:
- Have the dumps shares exported by both servers (labstore1006 & 7) mounted on all instances that need it. Would like like:
labstore1007.wikimedia.org:/public on /mnt/nfs/labstore1007-dumps type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=10.68.18.113,local_lock=none,addr=22.214.171.124) labstore1006.wikimedia.org:/public on /mnt/nfs/labstore1006-dumps type nfs4 (ro,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=300,retrans=3,sec=sys,clientaddr=10.68.18.113,local_lock=none,addr=126.96.36.199)
- Symlink the desginated NFS distribution server (1006 or 7) to /public/dumps. This would end up looking like:
root@newdumps-test:/# ls -al /public/ total 12 drwxr-xr-x 3 root root 4096 Jan 11 22:09 . drwxr-xr-x 23 root root 4096 Jan 10 06:48 .. lrwxrwxrwx 1 root root 27 Jan 11 22:09 dumps -> /mnt/nfs/labstore1006-dumps
- Proof of concept for how this definition would like in nfsclient.pp is at https://gerrit.wikimedia.org/r/#/c/403767/
Failover and fail back plan:
Assuming we are failing over from 1006 to 1007
- Switch symlink target for /public/dumps to /mnt/nfs/labstore1007-dumps in puppet (nfsclient.pp)
- Roll out puppet change to all instances. This will have just switched the symlinks. Open filehandles to the labstore1006-dumps will continue to read from it.
- Lazy unmount /mnt/nfs/labstore1006-dumps across all instances. We can do this using nfs-mount-manager umount /mnt/nfs/labstore1007-dumps, and will detach the filesystem from the file hierarchy right away, and clean up all references to this filesystem as soon as it is not busy anymore.
- At this point we can wait for a while for current open connections to die, or proceed if we don't have a choice. If the backend nfs server goes away, our lazy unmount will ensure that the instance itself still behaves fine and is responsive to commands like df and lsof, but the processes that are reading from the old location may go into uninterruptible sleep (D state) and may need to be cleaned up. Hopefully we can wait for the connections to close in most planned maintenance cases.
- The next step would be to service nfs-kernel-server stop on labstore1006.
- Perform maintenance/reboot
- Bring back nfs-kernel-server, make sure all the shares are exporting okay
- Run puppet across instances to ensure that the labstore1006-dumps mount is mounted again. nfs-mount-manager check /mnt/nfs/labstore1006-dumps to verify.
- To fail back, switch symlinks back to the labstore1006 mount in nfsclient.pp, run puppet to apply across instances.
Small note: we talked yesterday about the failover plan and practicalities. Let's codify something either baked into nfs-mount-manager or as an external tool to kill any process accessing a particular mount path for the case where gracefulness will not be possible. Hopefully that's a small percentage of the time but definitely non-0 and we really want to have thought through how to prevent a scattering of d wait process across exec/workers before the moment arrives :)
Chatted with @Ottomata today in #wikimedia-analytics, and we decided to use a similar strategy for the stat/notebook mounts. We'll mount shares from labstore1006/7 in /mnt, and symlink the active NFS one to /mnt/data (which is the current access point for stat users).