Page MenuHomePhabricator

Allow labstores to hot or warm swap in case of failure
Closed, DeclinedPublic


Currently, thanks to the saner behaviour of recent kernels, it is possible for both labstore1001 and labstore1002 to be up at same time while attached to the storage shelves. The raid arrays are constructed on both servers, but remain readonly until an actual write to the block device is performed.

This allows the server to be live even if it is not actively serving NFS so long as the filesystems atop the logical volumes constructed from the raid arrays are not mounted read-write.

The NFS server start script (which is not invoked automatically) does exacly this: mount the filesystems, populate the exports from the current LDAP config by starting the manage-nfs-volumes-daemon, assign the floating NFS server IP to the interface, and start the NFS server proper. The enire process takes less than 5-10 seconds and is (essentially*) invisible to the clients because the FSID remains identical and the backing filesystem is the same.

This obviously can only be done safely if the other server does not have any of the filesystems mounted. In practice, it is only absolutely safe to start-nfs on one of the server if the other is (a) powered off, or (b) freshly rebooted. It can be safe to do so if the filesystems have been manually unmounted and flushed, but because of the risk of a stray filesystem being corrupted this should normally not be done.

The servers come up in a way that they are ready to start the NFS service; if one dies, recovering the system is as simple as running the start-nfs script on the other - provided the other is guaranteed to not return. Any automated system needs to have the certitude that the other will not try to write to the filesystems again. Possible ways to ensure this may be to (a) check that the server is actually powered down or halted (b) have a server report on boot that is is verifiably idle, and remove that acknowledgement before it starts trying to serve NFS, (c) ... others? Need investigation.

Alternately, we can satisfy ourselves with a warm standby model where turning on service on the alternate server requires manual intervention; and ensure thaqt the process includes "halting/powering down the currently active server first".

* Only "essentially" because the loss of what little state NFS has means that most file operations will stall for up to a minute while the clients reconnect and get their bearing back.

Event Timeline

coren created this task.Mar 23 2015, 3:43 PM
coren raised the priority of this task from to High.
coren updated the task description. (Show Details)
coren added a project: Cloud-Services.
coren added subscribers: coren, yuvipanda.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 23 2015, 3:43 PM
coren added a comment.Mar 23 2015, 4:12 PM

Another caveat to note is that this not not provide standby against a shelf failing: that would still requires disk to be physically moved to the backup shelf and it being wired to replace the failed one.

@mark @coren are we still going to do this? Afaik attempts at this might have been responsible for parts of the big NFS outage, and we decided to not have both machines connected at the same time? Is that correct?

chasemp assigned this task to coren.Nov 30 2015, 6:45 PM
chasemp added a subscriber: chasemp.

seems to come with the other tasks discussed re: labstore in the meeting today so I'm tossing your way @coren

coren closed this task as Declined.Nov 30 2015, 6:48 PM

@mark @coren are we still going to do this? Afaik attempts at this might have been responsible for parts of the big NFS outage, and we decided to not have both machines connected at the same time? Is that correct?

No, we are not - at least not for the foreseeable future as, regardless of whether that was the ultimate cause of the NFS outage or not, it is not possible to make hard guarantees that disks are strictly reserved to only one of the servers at a time. The role of the secondary server will remain to that of cold swap.