Change Details

(This has been done between @jcrespo and myself after having a chat with @faidon) - feel free to edit if you see something wrong Databases present: ** es1016 - static (ro) external storage, can be depooled at any time, and it *should* depool automatically when unavailable *** Plan: depool indefinitely ** db1060 - s2 api "SPOF" because 54 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal *** Plan: pool an s2 api server somewhere else (maybe moving just the chassis somewhere else?) ** db1059 - s4 api - should be covered by 68, can be depooled at any time, and it *should* depool automatically when unavailable *** Plan: depool indefinitely ** db1057 - s1 MASTER SPOF. Mediawiki goes read only when unavailable. *** Switchover to another server ** db1056 - s4 rc - should be covered by 63, can be depooled at any time, and it *should* depool automatically when unavailable *** Plan: depool indefinitely ** db1055 - s1 rc "SPOF" because 51 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal *** Plan: pool an s1 rc server somewhere else (requires partitioning - copy from codfw? move server physically?) ** db1054 - s2 api "SPOF" because 60 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal *** Plan: pool an s2 api server somewhere else ** db1052 - s1 old master (db1095's master) - it should only affect new labsdb servers, not a priority *** Plan: do nothing ** db1051 - s1 rc "SPOF" because 55 is on the same rack. Technically, API goes to the other main servers automatically, but the performance may not be ideal *** Plan: pool an s1 rc server somewhere else (requires partitioning - copy from codfw?) - move the chassis somewhere else? ** db1088 - s6 - should be covered by 85 and 93, but due to the large weight it could impact negatively on rebalancing- should be depooled or lowered weight to avoid large flapping *** Plan: depool or lower weight ** db1087 - s5 - should be covered by 82 and 92, but due to the large weight it could impact negatively on rebalancing- should be depooled or lowered weight to avoid large flapping *** Plan: depool or lower weight ** labstore1004 (only regarding db stuff): it handles labsdb accounting- accoording to Yuvi, when it fails, new accounts creation are paused, but it should go back to normal, without account loss when back up *** Do nothing ** es1015 - es2 slave. Can be depooled at any time, and it *should* depool automatically when unavailable *** Plan: depool indefinitely Impact if the switch goes down: * s1 would run out of rc hosts- which means RCs stops working, for enwiki ** Quickest way of solving it: move one of the servers to another rack? * s2 would run out of api hosts ** Quickest way of solving it: move one of the servers to another rack? * s1 thread for new labsdb servers will be out of dated until network is back (not a priority) Roadmap: * s1 - move db1051 to another rack (high priority) -> maybe to B3 - T156004 * s1 - move db1052 to another rack (high priority) -> maybe to B3 - T156006 * s1 - reimage db1065/db1066 to 10.0.28 - T156005 * s1 - switchover master: db1057 -> db1052 - T156008 * s1 - move db1073 to another rack (they are all on D1) - T156126 * s2 - move db1054 to another rack -> maybe to C3 We will create subtasks soon