Page MenuHomePhabricator

Prepare and improve the datacenter switchover procedure
Closed, ResolvedPublic

Description

We are planning a datacenter switchover which is scheduled tentatively for the beginning of next quarter. In order to be able to do that, there are several thing that are needed as a preparation work:

  • Check MediaWiki for new/old "eqiad-only" dependencies
  • Check all services on scb
  • Install any missing service in codfw
  • Install a secondary etcd cluster in codfw
  • Modify MediaWiki config so that new services in codfw are correctly configured there
  • Add TLS whenever feasible in order to make cross-dc calls encrypted whenever possible

Apart from this, we need to improve on the switchover procedure itself. Last time, it required a lot of coordination and long sequence of commands we had to execute in sequence. Another goal we have to pursue is to simplify this procedure, by reducing the number of manual steps needed and the number of code commits needed for the switchover.

  • Deploy the new cluster orchestration tool
  • Create a script based on said tool to automate most of the procedures
  • Make the switchover depend less on code deploys by integrating conftool with the configuration of services
  • Create a real ES-memcached warmup tool that is not just a list of URLs for apache-fast-test, or modify apache-fast-test to work more in parallel.

Event Timeline

Joe created this task.Jan 5 2017, 11:51 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 5 2017, 11:51 AM
elukey added a subscriber: elukey.Jan 5 2017, 11:53 AM
mark added a project: Epic.Jan 11 2017, 5:27 PM
MoritzMuehlenhoff triaged this task as High priority.Jan 17 2017, 9:52 AM
Gilles added a subscriber: Gilles.Jan 18 2017, 5:08 PM

Can you make a subtask for the warmup tool with details about what you need and add the Performance-Team tag to it?

Joe added a comment.Jan 23 2017, 4:49 PM

@Gilles will do today or tomorrow

fgiunchedi updated the task description. (Show Details)Jan 23 2017, 5:36 PM
Joe updated the task description. (Show Details)Mar 27 2017, 6:44 AM
Joe added a comment.EditedMar 27 2017, 7:09 AM

What still needs to be done:

  • Integrate discovery system in puppet/MediaWiki config extensively. Patches for puppet are just waiting to be merged.
  • Check MediaWiki config for eqiad-only entries
  • TLS is mostly unimplemented: only MediaWiki exposes its services via TLS.
  • Etcd querying in MediaWiki is not ready, but our goals can be reached nonetheless.
Joe updated the task description. (Show Details)Apr 3 2017, 6:23 AM

Change 346544 had a related patch set uploaded (by Jcrespo):
[operations/mediawiki-config@master] Make mediawiki-eqiad dc read-only before switchover to codfw

https://gerrit.wikimedia.org/r/346544

Change 346547 had a related patch set uploaded (by Jcrespo):
[operations/mediawiki-config@master] Make mediawiki codfw dc read-write after switchover to codfw

https://gerrit.wikimedia.org/r/346547

Change 346547 abandoned by Jcrespo:
Make mediawiki codfw dc read-write after switchover to codfw

Reason:
duplicate of https://gerrit.wikimedia.org/r/346251

https://gerrit.wikimedia.org/r/346547

Change 346544 abandoned by Jcrespo:
Make mediawiki-eqiad dc read-only before switchover to codfw

Reason:
Duplicate of https://gerrit.wikimedia.org/r/346251

https://gerrit.wikimedia.org/r/346544

jcrespo closed this task as Resolved.Jul 25 2017, 4:44 PM
jcrespo assigned this task to Joe.
jcrespo added a subscriber: jcrespo.

I would close this as resolved, the only unchecked part is tracked on T134809, and I think it was from the beginning out of the scope of the switchover, it was already part of the scope of active-active, correctly tracked as child of T88445. MySQL, for a long time now, does not send unencrypted queries or traffic cross-dc. Reopen if disagree with my assesment.