Page MenuHomePhabricator

Figure out and document the datacenter switchover process
Closed, ResolvedPublic

Description

We need to figure out the process for switching from eqiad to codfw and back in our current hot/cold setup. We previously did something similar for the pmtpa->eqiad switchover, but this is wildly out of date by now. We'll need a new checklist and one we should keep up-to-date going forward.

Details

Related Gerrit Patches:
operations/puppet : productionparsoid::testing: use master_dc variables

Related Objects

StatusAssignedTask
InvalidNone
Resolvedjcrespo
ResolvedKrinkle
ResolvedNone
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
ResolvedJoe
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedJoe
ResolvedJoe
ResolvedRobH
Resolvedelukey
ResolvedJoe
ResolvedKrinkle
Resolvedaaron
ResolvedKrinkle
Resolvedelukey
Resolvedelukey
ResolvedJoe
ResolvedJoe
Resolvedjcrespo

Event Timeline

faidon created this task.Jan 25 2016, 4:42 PM
faidon raised the priority of this task from to Medium.
faidon updated the task description. (Show Details)
faidon added a subscriber: faidon.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 25 2016, 4:42 PM

Change 275814 had a related patch set uploaded (by Giuseppe Lavagetto):
parsoid::testing: use master_dc variables

https://gerrit.wikimedia.org/r/275814

Krinkle removed a subscriber: Krinkle.Mar 8 2016, 9:44 PM

Change 275814 merged by Giuseppe Lavagetto:
parsoid::testing: use master_dc variables

https://gerrit.wikimedia.org/r/275814

This is a duplicate, but I would merge T114398 into it, as this one has activity.

Krinkle added a subscriber: aaron.Apr 21 2016, 3:21 PM
In T114398, @aaron wrote:>

See also T114271.
We need scripts and processes to do a planned switch from master datacenter A to B:
a) Go read-only on the app level (mostly MediaWiki)
b) Make sure write traffic stops
c) Go read-only for all data stores
d) Wait for all data stores in the B datacenter to catch up and be in sync with A
e) Make the B datacenter the new master datacenter (systems and app level)
f) End read-only mode
Read-only mode should be as short as possible so we can actually test this.

"we should keep up-to-date going forward" is not really a finite actionable task, I would consider this Resolved. There are multiple things to fix- but on the process itself (e.g. better database orchestration), not only on the documentation.

jcrespo closed this task as Resolved.Apr 28 2016, 9:53 AM
jcrespo claimed this task.