Page MenuHomePhabricator

Switchover of the application servers to codfw
Closed, ResolvedPublic

Description

For switching over the appservers, we need to do the following:

  1. Make it easy to switch the configuration of mediawiki to have a different primary datacenter (see T114273)
  2. Correct all the things that are eqiad-only in mediawiki-config (there are many, introduced after we got it on par...)
  3. Perform a quick sanity check of performance of a sample of pages in the two DCs
  4. Check the amount of resources available in the various clusters - codfw was sized as a slightly reduced eqiad at the time, but we did change some ratios in eqiad in the meantime.

See also: Why they're doing this https://meta.wikimedia.org/wiki/Tech/Server_switch_2016

Event Timeline

Joe created this task.Jan 25 2016, 4:52 PM
Joe updated the task description. (Show Details)
Joe raised the priority of this task from to Normal.
Joe added subscribers: faidon, Aklapper, Joe.
Joe updated the task description. (Show Details)Jan 25 2016, 5:01 PM
Joe set Security to None.
Joe updated the task description. (Show Details)Jan 26 2016, 10:32 AM

Change 266481 had a related patch set uploaded (by Giuseppe Lavagetto):
Use the logical redis definition for GettingStarted.

https://gerrit.wikimedia.org/r/266481

Change 266481 merged by jenkins-bot:
Use the logical redis definition for GettingStarted.

https://gerrit.wikimedia.org/r/266481

JJMC89 added a subscriber: JJMC89.Mar 1 2016, 7:28 PM
Elitre added a subscriber: Elitre.Mar 2 2016, 4:29 PM

(Possibly silly question alert: in case things went wrong, would this be reflected on http://status.wikimedia.org/ , or would that site be down as well? )

tomasz added a subscriber: tomasz.Mar 2 2016, 7:31 PM

(Possibly silly question alert: in case things went wrong, would this be reflected on http://status.wikimedia.org/ , or would that site be down as well? )

status.wikimedia.org is hosted outside of our servers and as such will not go down while this takes place.

(Possibly silly question alert: in case things went wrong, would this be reflected on http://status.wikimedia.org/ , or would that site be down as well? )

Neither, really. That page will stay up but as a monitor it's quite naive and won't catch a lot of the things that may not go wrong with this switchover. So, for instance, if we stay read-only for a prolonged period of time it would be a failure on our part but would not be reflected on that status page.

Trizek added a subscriber: Trizek.Mar 8 2016, 3:31 PM
Trizek removed a subscriber: Trizek.
Akeron added a subscriber: Akeron.Mar 10 2016, 6:34 PM
IKhitron removed a subscriber: IKhitron.Mar 10 2016, 8:58 PM

#3 (and #4, partially -for mediawiki "simple" user HTTP requests and dbs) is being done at "careful speed", but more extensively at T124697. That ticket does not include testing things like the jobqueue, specialpages update, search, parsoid, etc.

tomasz removed a subscriber: tomasz.Mar 18 2016, 5:27 PM
Krinkle closed this task as Resolved.
Krinkle claimed this task.