Page MenuHomePhabricator

Perform a datacenter switchover (2018-19 Q1)
Closed, ResolvedPublic

Description

This is the tracking task for one of the Wikimedia Technology/Goals/2018-19 Q1.

Perform a datacenter switchover:

  • Successfully switch backend traffic (MediaWiki, Swift, RESTBase, and Parsoid) to be served from codfw with no downtime and reduced read-only time.
  • Serve the site from codfw for at least 3 weeks. Tentative dates are to switch over to codfw in the second half of September and switch back to eqiad after 3~4 weeks.
  • Refactor the switchdc script into a more re-usable automation library and update it to the newer switchover requirements.

Switchover

Services: Tuesday, September 11th 2018 14:30 UTC
Media storage/Swift: Tuesday, September 11th 2018 15:00 UTC
Traffic: Tuesday, September 11th 2018 19:00 UTC
Mediawiki: Wednesday, September 12th 2018: 14:00 UTC

Switchback:

Traffic: Wednesday, October 10th 2018 09:00 UTC (and maybe some prep work on Monday)
Mediawiki: Wednesday, October 10th 2018: 14:00 UTC
Services: Thursday, October 11th 2018 14:30 UTC
Media storage/Swift: Thursday, October 11th 2018 15:00 UTC

Related Objects

StatusSubtypeAssignedTask
Resolvedakosiaris
Resolved Marostegui
Declined Marostegui
ResolvedSep 9 2018 Marostegui
Resolved Marostegui
Resolvedjcrespo
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolvedakosiaris
Resolved Marostegui
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedVolans
ResolvedJoe
Resolved Marostegui
ResolvedNone
Resolved Marostegui
ResolvedNone
ResolvedKrinkle
Resolvedjcrespo
Resolvedhoo
Resolvedakosiaris

Event Timeline

Krinkle renamed this task from Perform a datacenter switchover to Perform a datacenter switchover (2018-19 Q1).Jul 12 2018, 5:14 AM

Change 455553 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] graphite: alert when eqiad and codfw drift in number of thumbnails

https://gerrit.wikimedia.org/r/455553

Change 455553 merged by Filippo Giunchedi:
[operations/puppet@production] graphite: alert when eqiad and codfw drift in number of thumbnails

https://gerrit.wikimedia.org/r/455553

Change 455811 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] graphite: use keepLastValue for thumbs drift alert

https://gerrit.wikimedia.org/r/455811

Change 455811 merged by Filippo Giunchedi:
[operations/puppet@production] graphite: use keepLastValue for thumbs drift alert

https://gerrit.wikimedia.org/r/455811

Change 456175 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mediawiki::maintenance: use mw_primary to enable/disable crons

https://gerrit.wikimedia.org/r/456175

Change 456175 abandoned by Dzahn:
mediawiki::maintenance: use mw_primary to enable/disable crons

https://gerrit.wikimedia.org/r/456175

Mentioned in SAL (#wikimedia-operations) [2018-09-10T09:30:32Z] <volans> starting execution of "cookbook sre.switchdc.mediawiki --live-test codfw eqiad" - T199073

Mentioned in SAL (#wikimedia-operations) [2018-09-10T11:44:09Z] <volans> completed execution of "cookbook sre.switchdc.mediawiki --live-test codfw eqiad" - T199073

akosiaris claimed this task.
akosiaris updated the task description. (Show Details)

Successfully switched (with some aftermath and actionables but successfully nevertheless) to codfw and back per the subtasks, I am resolving this.