Page MenuHomePhabricator

2017/18 Annual Plan Program 8: Multi-datacenter support
Closed, ResolvedPublic

Description

Annual plan link: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Draft/Programs/Technology#Program_8._Multi-datacenter_support
Teams: TechOps, MediaWiki, Services, Performance
Strategic Priorities: Reach, Communities
Time Frame: 12 months

Summary

Although Wikimedia currently operates two data centers each independently capable of serving our core sites and services, many of our services – including our most important core platform component (MediaWiki) – are only active in a single data center at any point in time, with the other data center being on standby. Switching between the two data centers is currently a very involved manual process with significant impact to the availability of our services for our users and substantial risk of failure. By extending existing services (and MediaWiki in particular) with support for serving requests from multiple data centers concurrently, this impact can be minimized and currently unused performance benefits can be leveraged.

Goal

We will improve availability and performance for our users, while also minimizing the impact from fail-over testing and catastrophes. We will do this by expanding our multi-data center capabilities to serve requests from multiple data centers simultaneously.

Outcomes and Objectives

  • Outcome 1: Our audiences enjoy improved MediaWiki and REST API availability and reduced wiki read-only impact from data center fail-overs.
    • Objective 1: MediaWiki support for having read-only “read” requests (GET/HEAD) be routed to other data centers
    • Objective 2: Test an active/active deployment for read-only requests of the MediaWiki application platform and REST APIs
    • Objective 3: Integrate MediaWiki with dynamic configuration or service discovery, in order to reduce the time required for a master switch from one datacenter to another
  • Outcome 2: Backend infrastructure works reliably across data centers.
    • Objective 1: Set up a robust multi-data center event & job processing infrastructure, and migrate all job queue use cases: T169937 (Q1)
    • Objective 2: Full support for serving REST API requests from both core data centers simultaneously

Event Timeline

GWicke renamed this task from 2017/18 annual plan program 8: Multi-datacenter support to 2017/18 Annual Plan Program 8: Multi-datacenter support.Sep 7 2017, 4:20 PM