Page MenuHomePhabricator

switchdc services cookbook should allow pooling services in both DCs (active/active)
Closed, ResolvedPublic

Description

During a switchover (eqiad -> codfw) we want to have services running in an active/passive setup, with codfw pooled, and eqiad depooled. But when switching back (codfw -> eqiad), we want to be active/active, with both DCs pooled.

The cookbook currently forces one DC to be pooled and the other depooled. This means to go active/active, you have to manually pool it with conftool, switching traffic again.

Ideally the cookbook would have a flag that lets us go to an active/active state.

snippets from IRC today:

08:52:41 <legoktm> the only two notes I have are: 1) have the cookbook allow keeping the (now) passive DC pooled, 2) fix the !log line splitting
08:53:44 <volans> for (1) seems that a simple option might do wit
08:53:48 <volans> *do it
08:55:41 <rzl> yeah, either something like `sre.switchdc.services --pool-both` to indicate what it does, or something like `sre.switchdc.services --switchback` to indicate why you'd want it

Personally I like --pool-both.

Event Timeline

Something like --restore is also a possibility, as sort of a middle ground.

When writing this we'll also have to decide whether to tackle T285711 at the same time, since it would need special care in the implementation.

The sre.discovery.datacenter cookbook allows to depool or repool a full datacenter (excluding A/A mediawiki services) at will with one command, as described in https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchback