Page MenuHomePhabricator

Create a Master-master topology between datacenters for easier failover (setup circular replication dallas -> eqiad for mysql databases)
Closed, ResolvedPublic

Description

Relatively easy task, literally "executing CHANGE MASTER on the 7 masters" with the following pending issues:

  • Master-Master breaks our monitoring "dbtree" (basically because it is stops being a tree and becomes a general graph)
  • We have to make sure that writes are not done my accident on the passive master, breaking production. Specifically, writing to codfw because "it is not production". It need some protection there.

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added projects: DBA, SRE.
jcrespo subscribed.
fgiunchedi subscribed.
jcrespo renamed this task from Create a Master-master topology between datacenters for easier failover to Create a Master-master topology between datacenters for easier failover (setup circular replication dallas -> eqiad for mysql databases).Feb 4 2016, 9:02 PM
jcrespo set Security to None.
jcrespo added a subscriber: faidon.

Change 276127 had a related patch set uploaded (by Jcrespo):
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276127

Change 276127 merged by Jcrespo:
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276127

Change 276134 had a related patch set uploaded (by Jcrespo):
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276134

Change 276134 merged by Jcrespo:
Avoid infinite loops when using circular replication

https://gerrit.wikimedia.org/r/276134

Tendril and dbtree has been "fixed" (just applied a simple patch to avoid infinite loops when constructing the tree). We will need a proper replacement for ploting more complex graphs (returning links is currently not represented on the graph), but that is enough to unblock the circular replication between datacenter masters.

Change 276137 had a related patch set uploaded (by Jcrespo):
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276137

Change 276137 merged by Jcrespo:
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276137

Change 276140 had a related patch set uploaded (by Jcrespo):
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276140

Change 276140 merged by Jcrespo:
Fix other slaves not being examined when one was already visited

https://gerrit.wikimedia.org/r/276140

All production machines that have cross datacenter replication (x1, s*, es2, es3) have been setup with circular replication.

Not setting up m* shards (misc), as they have a different setup (proxy) in front of it.