Page MenuHomePhabricator

Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001)
Open, NormalPublic

Description

I split this off from T137928: Deploy phabricator to phab2001.codfw.wmnet to unblock that task because it doesn't actually depend on this one being completed.

See T164810: Switch phabricator production to codfw for a high-level outline of the steps. We should document the process more formally (and someplace besides phabricator since it won't be available when phabricator is offline).

Phabricator isn't considered the highest priority of systems to be getting online in the event of a disaster, nonetheless, it should be possible to recover relatively quickly so that we can utilize phabricator when coordinating the recovery of other systems.

Wikitech page for Disaster Recovery plan: https://wikitech.wikimedia.org/wiki/Phabricator/Disaster_Recovery

Event Timeline

mmodell created this task.Mar 23 2018, 8:22 PM
mmodell triaged this task as Normal priority.
mmodell updated the task description. (Show Details)Mar 23 2018, 8:31 PM
Dzahn added a comment.Mar 23 2018, 9:41 PM

Consider storing the information on wikitech wiki. Since there is wikitech-static which is a copy of that and kept completely outside normal WMF infratstructure for this very reason, to be available in the event of a disaster.

https://wikitech-static.wikimedia.org/wiki/Main_Page

jcrespo moved this task from Triage to Blocked external/Not db team on the DBA board.

note: the steps are a bit different for failing over between data centers vs within a single data center.

From @Dzahn via IRC:

07:48:42	<mutante>	for eqiad/codfw parts of it are all prepared in hiera
07:48:47	<mutante>	and applied per dc
07:49:02	<mutante>	for eqiad we have IPs applied via hostname
07:49:08	<mutante>	for codfw by role
07:49:23	<mutante>	this inconsistency was actually nice for a switch to phab1002 in this case
07:49:35	<mutante>	i could just set other IPs for phab1002 also by host
mmodell updated the task description. (Show Details)Jun 8 2018, 12:57 PM
mmodell moved this task from Backlog to Soon on the User-MModell board.Sep 10 2018, 4:58 PM
Alroilim closed this task as Declined.Feb 2 2019, 7:17 PM
Alroilim removed mmodell as the assignee of this task.
Alroilim set Due Date to Feb 1 2019, 9:00 PM.
Alroilim updated the task description. (Show Details)
Alroilim removed subscribers: Ladsgroup, jcrespo, Aklapper and 5 others.
Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptFeb 2 2019, 7:17 PM
Gopavasanth reopened this task as Open.Feb 2 2019, 7:43 PM
Gopavasanth assigned this task to mmodell.
Gopavasanth added subscribers: Ladsgroup, jcrespo, Aklapper.
Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptFeb 23 2019, 6:15 AM