Page MenuHomePhabricator

Switchover of the application servers to codfw
Closed, ResolvedPublic

Description

For switching over the appservers, we need to do the following:

  1. Make it easy to switch the configuration of mediawiki to have a different primary datacenter (see T114273)
  2. Correct all the things that are eqiad-only in mediawiki-config (there are many, introduced after we got it on par...)
  3. Perform a quick sanity check of performance of a sample of pages in the two DCs
  4. Check the amount of resources available in the various clusters - codfw was sized as a slightly reduced eqiad at the time, but we did change some ratios in eqiad in the meantime.

See also: Why they're doing this https://meta.wikimedia.org/wiki/Tech/Server_switch_2016

Event Timeline

Joe raised the priority of this task from to Medium.
Joe updated the task description. (Show Details)
Joe added subscribers: faidon, Aklapper, Joe.
Joe set Security to None.

Change 266481 had a related patch set uploaded (by Giuseppe Lavagetto):
Use the logical redis definition for GettingStarted.

https://gerrit.wikimedia.org/r/266481

Change 266481 merged by jenkins-bot:
Use the logical redis definition for GettingStarted.

https://gerrit.wikimedia.org/r/266481

(Possibly silly question alert: in case things went wrong, would this be reflected on http://status.wikimedia.org/ , or would that site be down as well? )

(Possibly silly question alert: in case things went wrong, would this be reflected on http://status.wikimedia.org/ , or would that site be down as well? )

status.wikimedia.org is hosted outside of our servers and as such will not go down while this takes place.

(Possibly silly question alert: in case things went wrong, would this be reflected on http://status.wikimedia.org/ , or would that site be down as well? )

Neither, really. That page will stay up but as a monitor it's quite naive and won't catch a lot of the things that may not go wrong with this switchover. So, for instance, if we stay read-only for a prolonged period of time it would be a failure on our part but would not be reflected on that status page.

#3 (and #4, partially -for mediawiki "simple" user HTTP requests and dbs) is being done at "careful speed", but more extensively at T124697. That ticket does not include testing things like the jobqueue, specialpages update, search, parsoid, etc.

Krinkle claimed this task.
Krinkle moved this task from In Progress to Done on the codfw-rollout-Jan-Mar-2016 board.