Recently we have executed a few failovers between the labstore1004 and labstore1005 hosts and had issues with the nfs-manage script. It seemed so great a year ago :)
- Add documention to wikitech for script usage and purpose
- Add better inline documentation on steps for up/down
- Testing with more load as the principle problem is the inability to umount portions of the bind mount tree which means not being able to umount the underlying devices. The short term resolution is to reboot the server to release resources which is seen by the other node in the pair. That's not a very good place to be.
I did the last 2 migrations cat'ing the contents of the file and stepping through the process line by line and the procedure is broad strokes solid but could use improvements and further testing for high load / usage scenarios.