Page MenuHomePhabricator

MASTER_POS_WAIT() alternative that works cross-DC
Closed, ResolvedPublic

Description

MASTER_POS_WAIT() does not work accross out datacenters due to the binlogs have no direct relationship.

We need an option to use a different method, such as:
a) MASTER_GTID_WAIT()
b) Checking the heartbeat table and sleeping a bit until timeout/sync (in non-transaction mode to avoid repeatable-read staleness)

Event Timeline

Other restrictions noted by Giuseppe: default must be compatible with regular, simple mediawiki replication.

I have to recheck, but I am not sure we only sleep, we also compare directly after doing a write (is posa > posb), so it is slightly more complex than just changing that one function, we have to overload that class we slightly patched last time.

Change 289985 had a related patch set uploaded (by Aaron Schulz):
[WIP] Added GTID support to slave lag methods

https://gerrit.wikimedia.org/r/289985

Thank you for this. These checks keeps getting more and more complex, which means it will need extensive testing. GTID will take some time to be fully deployed on the cluster, it has some rough edges on ops side, but it is in progress.

ori subscribed.

@jcrespo, will you be adding this to the beta cluster as well?

I cannot, they use 5.5 on beta.

Change 289985 merged by jenkins-bot:
Added GTID support to slave lag methods

https://gerrit.wikimedia.org/r/289985

Change 302635 had a related patch set uploaded (by Aaron Schulz):
Enable MASTER_GTID_WAIT() on s6

https://gerrit.wikimedia.org/r/302635

Change 302635 merged by jenkins-bot:
Enable MASTER_GTID_WAIT() on s6

https://gerrit.wikimedia.org/r/302635