Page MenuHomePhabricator

Document how to disable x2 per DC
Closed, ResolvedPublic

Description

A follow up from T315271 would be to document what is needed to depool a whole DC in MW in case all the databases from a given DC go down (or the master fails).
I have assigned this to @tstarling but also pinged @Krinkle at T315271#8177397 so feel free to re-assign if needed.
If possible, please could you add that to https://wikitech.wikimedia.org/wiki/MariaDB#x2

Thanks

Related Objects

StatusSubtypeAssignedTask
Resolvedaaron
Openjijiki
OpenNone
ResolvedKrinkle
ResolvedKrinkle
Resolvedaaron
Resolvedtstarling
Declinedaaron
Resolvedaaron
ResolvedEevans
Resolvedaaron
ResolvedKrinkle
ResolvedPapaul
ResolvedMarostegui
Resolvedaaron
ResolvedKrinkle
Resolvedtstarling
Resolvedtstarling
Resolvedtstarling

Event Timeline

Krinkle triaged this task as Medium priority.Aug 29 2022, 7:23 PM
Krinkle moved this task from Inbox to Doing: Goal-oriented on the Performance-Team board.

Moving to current, as being follow-up for the Multi-DC goal.

From irc:

[09:45:42]  <volans> is there already some documentation on how depool MW from codfw? I was thinking to add a note in the dns's admin_state with a link as a reminder to consider if people needs to depool MW too when depooling one of the core DCs in the DNS
[09:46:34]  <volans> (this ofc applies mostly for the RO DC, is not meant to replace the switchdc workflow, whose cookbooks needs to be refactored to take into account the new structure)
[09:47:15]  <@marostegui> volans: I created https://phabricator.wikimedia.org/T315995 but we can probably expand it to be: how to disable the RO DC from MW

@Krinkle @tstarling can we document the above too? How to disable the RO DC entirely from MW in case of issues?

From irc:

[09:45:42]  <volans> is there already some documentation on how depool MW from codfw? I was thinking to add a note in the dns's admin_state with a link as a reminder to consider if people needs to depool MW too when depooling one of the core DCs in the DNS
[09:46:34]  <volans> (this ofc applies mostly for the RO DC, is not meant to replace the switchdc workflow, whose cookbooks needs to be refactored to take into account the new structure)
[09:47:15]  <@marostegui> volans: I created https://phabricator.wikimedia.org/T315995 but we can probably expand it to be: how to disable the RO DC from MW

@Krinkle @tstarling can we document the above too? How to disable the RO DC entirely from MW in case of issues?

You're pinging the wrong people. Where would you like the docs to be, besides the documentation for our dnsdisc system that is already in place?

@Joe who should I ping about x2 then?

Change 830607 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/cookbooks@master] Add cookbook to easily route mediawiki traffic

https://gerrit.wikimedia.org/r/830607

Change 830607 merged by jenkins-bot:

[operations/cookbooks@master] Add cookbook to easily route mediawiki traffic

https://gerrit.wikimedia.org/r/830607

@Marostegui we now have a cookbook to depool the non-primary datacenters from traffic, using:

sudo cookbook   sre.mediawiki.route-traffic primary

And restore active-active traffic using

sudo cookbook sre.mediawiki.route-traffic all

Keep in mind all these records take ~ 5 miutes to expire, and we're not wiping out resolver caches by design so you will most likely need to do so in advance of your change, or wait 5 minutes after depooling.