Page MenuHomePhabricator

Create a Master-master topology between datacenters for easier failover (setup circular replication dallas -> eqiad for mysql databases)
Closed, ResolvedPublic

Description

Relatively easy task, literally "executing CHANGE MASTER on the 7 masters" with the following pending issues:

  • Master-Master breaks our monitoring "dbtree" (basically because it is stops being a tree and becomes a general graph)
  • We have to make sure that writes are not done my accident on the passive master, breaking production. Specifically, writing to codfw because "it is not production". It need some protection there.

Event Timeline

jcrespo created this task.Nov 25 2015, 6:17 PM
jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added projects: DBA, Operations.
jcrespo added a subscriber: jcrespo.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 25 2015, 6:17 PM
fgiunchedi triaged this task as High priority.Dec 1 2015, 4:00 PM
fgiunchedi added a subscriber: fgiunchedi.
jcrespo renamed this task from Create a Master-master topology between datacenters for easier failover to Create a Master-master topology between datacenters for easier failover (setup circular replication dallas -> eqiad for mysql databases).Feb 4 2016, 9:02 PM
jcrespo set Security to None.
jcrespo added a subscriber: faidon.
Restricted Application added a project: codfw-rollout. · View Herald TranscriptFeb 4 2016, 9:03 PM
jcrespo claimed this task.Feb 4 2016, 9:03 PM
jcrespo moved this task from Triage to Backlog on the DBA board.Feb 21 2016, 5:50 PM

Change 276127 had a related patch set uploaded (by Jcrespo):
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276127

Change 276127 merged by Jcrespo:
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276127

Change 276134 had a related patch set uploaded (by Jcrespo):
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276134

Change 276134 merged by Jcrespo:
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276134

Tendril and dbtree has been "fixed" (just applied a simple patch to avoid infinite loops when constructing the tree). We will need a proper replacement for ploting more complex graphs (returning links is currently not represented on the graph), but that is enough to unblock the circular replication between datacenter masters.

jcrespo moved this task from Backlog to In progress on the DBA board.Mar 9 2016, 11:08 AM
jcrespo moved this task from Backlog to In Progress on the codfw-rollout-Jan-Mar-2016 board.

Change 276137 had a related patch set uploaded (by Jcrespo):
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276137

Change 276137 merged by Jcrespo:
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276137

Change 276140 had a related patch set uploaded (by Jcrespo):
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276140

Change 276140 merged by Jcrespo:
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276140

jcrespo closed this task as Resolved.Mar 16 2016, 11:36 AM

All production machines that have cross datacenter replication (x1, s*, es2, es3) have been setup with circular replication.

Not setting up m* shards (misc), as they have a different setup (proxy) in front of it.