Eventlogging databases (m4 shard): db1046 (m4-master), db1047 (analytics slave 1), dbstore1002 (analytics slave2), and dbstore2002 (dallas backup) use a custom replication mechanism for several reasons:
- Regular mysql replication is too slow and unsuitable for large batches of data
- Purging is innefficient over the network
- Specially, over WAN, things get very slow
- If replication stops, it is almost impossible to get them up to sync again
- Analytics slaves are IO-saturated due to the large announcement of long-running queries, combined with having data from 8+ shards in a single physical machines (needed to run JOINS)
The current solution is using a script (https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/files/mariadb/eventlogging_sync.sh) that does not have all the advantages that it could get, namely:
- parallel replication of several tables at the same time
- import and export using LOAD DATA, faster than parsing SQL commands
- Using a 3rd server to offload the process, so it minimizes the mysql server time used
- Using actual temporary files for batches, instead of OS unnamed pipes, that eventually fail due to locking the master during too much time
- Configurable purging
- No monitoring of the process