Page MenuHomePhabricator

Improve the failover mechanism for maps on cloudstore1008/9
Open, Stalled, MediumPublic

Description

On reviewing the setup obsessively, I realized that a locking issue will prevent smooth failover for maps in the event of an NFS server failover. As mentioned in T203469, NFS failover is not very good without shared cluster filesystems, so this isn't the only place where that is true. Currently I will be documenting the failover as suggesting that the maps servers be rebooted to clean up after.

Other ideas are possible (active/active DRBD volume with another data migration can be done with the spare disk on the server).

Event Timeline

Bstorm triaged this task as Medium priority.May 31 2019, 6:46 PM
Bstorm created this task.
Bstorm added a comment.Jun 5 2019, 6:38 PM

Active/active drbd requires detached storage. Overall, failover with NFS attached is never going to look very good. A clever solution may be possible once ceph is deployed.

Bstorm changed the task status from Open to Stalled.Aug 6 2019, 1:57 PM

Waiting on this in hopes that maybe one of the two volumes could be replaced with something else, like a cephfs.

Bstorm removed Bstorm as the assignee of this task.Sep 25 2019, 3:54 PM

Removing myself because cookie-licking is bad when I'm not working on it.