Page MenuHomePhabricator

Spin up virtualized NFS server strictly for Grid Engine database and management
Closed, DeclinedPublic

Description

In the current arrangement, the project and tools NFS is shared with the actual database of grid engine. It is possible to remove the dependence on NFS, but without a shared storage platform, shadow master functionality becomes worse or impossible even with an external DB server.

However, this means that a malfunctioning tool is able to corrupt the gridengine database or cause the entire grid to collapse for some time even after the NFS problem is recovered.

The only thing that needs to be preserved separately really is the spooling database and related files in the /var/spool/gridengine directly. Most of toolforge will collapse if NFS is in bad enough shape anyway, but the database needs to be kept in order.

Since this would encompass only files in the .system_sge directory or even a subdirectory of that, this really could be a VM in the tools project.

NOTE: It is also possible to spin up a BerkeleyDB Spooling Server, but the packaging scheme in our OS doesn't make it terribly easy. Besides that may be an even more unstable method of preserving the database than NFS.

Event Timeline

Bstorm created this task.
bd808 lowered the priority of this task from High to Low.Aug 8 2019, 4:03 AM
Bstorm lowered the priority of this task from Low to Lowest.Jun 11 2020, 10:46 PM

This may not even be the approach we take in the end. The grid may end up outside of tools first. Beyond that, we are likely to rebuild NFS servers to work differently first as well. As we are stabilizing our NFS design, this whole issue is much less scary.

We may think about this when the next iteration of NFS rebuild happens.