Document a general failover process in Wikitech for Clouddb Admins to follow for emergency failover of ToolsDB in case DBAs are not available to help with the process.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Bstorm | T216208 ToolsDB overload and cleanup | |||
Resolved | • Bstorm | T216753 Document ToolsDB failover process for Clouddb Admins |
Event Timeline
Comment Actions
Testing is something to think about here as well because it would be very good, and yet it would require coordinating an outage with the four users with non-replicated tables.
Comment Actions
For now, the doc is pretty good. Got help from the DBAs--also a reminder that we should be doing regular failover testing rather than things being quite so hard to do.
Comment Actions
I'd love to...except that we haven't set up a viable solution for the four non-replicated tables. They may be able to dump their tables (after we contact them) in preparation so that they can re-instate them after the failover. That may be the way forward with that, now that I moved wikilabels to its own replicated pair.