Eventlogging databases (m4 shard): db1046 (m4-master), db1047 (analytics slave 1), dbstore1002 (analytics slave2), and dbstore2002 (dallas backup) use a custom replication mechanism for several reasons:
* Regular mysql replication is too slow and unsuitable for large batches of data
* Purging is innefficient over the network
* Specially, over WAN, things get very slow
* If replication stops, it is almost impossible to get them up to sync again
* Analytics slaves are IO-saturated due to the large announcement of long-running queries, combined with having data from 8+ shards in a single physical machines (needed to run JOINS)
The current solution is using a script (https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/files/mariadb/eventlogging_sync.sh) that does not have all the advantages that it could get, namely:
* parallel replication of several tables at the same time
* import and export using LOAD DATA, faster than parsing SQL commands
* Using a 3rd server to offload the process, so it minimizes the mysql server time used
* Using actual temporary files for batches, instead of OS unnamed pipes, that eventually fail due to locking the master during too much time
* Configurable purging
* No monitoring of the process