(This has been done between @jcrespo and myself after having a chat with @faidon) - feel free to edit if you see something wrong
Databases present:
** es1016 - static (ro) external storage, can be depooled at any time, and it *should* depool automatically when unavailable
*** Plan: depool indefinitely
** db1060 - s2 api "SPOF" because 54 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal
*** Plan: pool an s2 api server somewhere else (maybe moving just the chassis somewhere else?)
** db1059 - s4 api - should be covered by 68, can be depooled at any time, and it *should* depool automatically when unavailable
*** Plan: depool indefinitely
** db1057 - s1 MASTER SPOF. Mediawiki goes read only when unavailable.
*** Switchover to another server
** db1056 - s4 rc - should be covered by 63, can be depooled at any time, and it *should* depool automatically when unavailable
*** Plan: depool indefinitely
** db1055 - s1 rc "SPOF" because 51 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal
*** Plan: pool an s1 rc server somewhere else (requires partitioning - copy from codfw? move server physically?)
** db1054 - s2 api "SPOF" because 60 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal
*** Plan: pool an s2 api server somewhere else
** db1052 - s1 old master (db1095's master) - it should only affect new labsdb servers, not a priority
*** Plan: do nothing
** db1051 - s1 rc "SPOF" because 55 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal
*** Plan: pool an s1 rc server somewhere else (requires partitioning - copy from codfw?) - move the chassis somewhere else?
** db1088 - s6 - should be covered by 85 and 93, but due to the large weight it could impact negatively on rebalancing- should be depooled or lowered weight to avoid large flapping
*** Plan: depool or lower weight
** db1087 - s5 - should be covered by 82 and 92, but due to the large weight it could impact negatively on rebalancing- should be depooled or lowered weight to avoid large flapping
*** Plan: depool or lower weight
** labstore1004 (only regarding db stuff): it handles labsdb accounting- accoording to Yuvi, when it fails, new accounts creation are paused, but it should go back to normal, without account loss when back up
*** Do nothing
** es1015 - es2 slave. Can be depooled at any time, and it *should* depool automatically when unavailable
*** Plan: depool indefinitely
Impact if the switch goes down:
* s1 would run out of rc hosts- which means RCs stops working, for enwiki
** Quickest way of solving it: move one of the servers to another rack?
* s2 would run out of api hosts
** Quickest way of solving it: move one of the servers to another rack?
* s1 thread for new labsdb servers will be out of dated until network is back (not a priority)
Roadmap:
* s1 - move db1051 to another rack (high priority) -> maybe to B3 - T156004
* s1 - move db1052 to another rack (high priority) -> maybe to B3 - T156006
* s1 - reimage db1065/db1066 to 10.0.28 - T156005
* s1 - switchover master: db1057 -> db1052 - T156008
* s1 - move db1073 to another rack (they are all on D1) - T156126
* s2 - move db1054 to another rack -> maybe to C3
We will create subtasks soon