- Due Date
- Fri, Feb 1, 9:00 PM
|Open||None||T182832 Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state|
|Open||None||T190568 Reimage both phab1001 and phab2001 to stretch|
|Resolved||Joe||T154658 Prepare and improve the datacenter switchover procedure|
|Open||None||T156937 Provide cross-dc redundancy (active-active or active-passive) to all important misc services|
|Open||None||T164810 Switch phabricator production to codfw|
|Resolved||mmodell||T152129 reinstall iridium (phabricator) as phab1001 with jessie|
|Stalled||mmodell||T137928 Deploy phabricator to phab2001.codfw.wmnet|
|Resolved||mmodell||T168699 Verify that the codfw lvs is configured correctly for Phabricator|
|Open||mmodell||T190572 Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001)|
Consider storing the information on wikitech wiki. Since there is wikitech-static which is a copy of that and kept completely outside normal WMF infratstructure for this very reason, to be available in the event of a disaster.
note: the steps are a bit different for failing over between data centers vs within a single data center.
From @Dzahn via IRC:
07:48:42 <mutante> for eqiad/codfw parts of it are all prepared in hiera 07:48:47 <mutante> and applied per dc 07:49:02 <mutante> for eqiad we have IPs applied via hostname 07:49:08 <mutante> for codfw by role 07:49:23 <mutante> this inconsistency was actually nice for a switch to phab1002 in this case 07:49:35 <mutante> i could just set other IPs for phab1002 also by host