Row based replication has some important advantages:
- It minimizes in most cases the replication lag (that we suffer in some hosts, sometimes)
- It minimizes slave drift, that we are suffering now
- More prone to break replication if a schema or data difference is detected (fails faster)
- It usually end up with reduced contention, better performance and less locking needed
It has some disadvantages:
- Increased size of binary logs, that affects both disk space and bandwidth needed. In edge cases (large blobs), it could impact binary logs performance when written to disk (configuration should be tuned)
- It makes difficult to do one-host-at-a-time schema changes (which is the main mode we do them for mediawiki core hosts right now)
- More prone to break replication if a schema or data difference is detected (fails faster) - this could be a curse or a blessing
- Performance may not be great if slaves do not have proper primary keys; it can on the other side improve perforamnce
- It requires an external tool (mysqlbinlog) in order to know the underlying ongoing queries (for example, if they are stuck)- changes are not shown on show processlist by the system threads
Sanitarium requires statement-based binary logs in order to filter rows to labsdbsNot anymore since triggers can happen replica side from 10.1+ (row based replication triggers)- It makes impossible or more difficult to use some tools like pt-table-checksum, specially on multi-tiered setups (for codfw, for labs)
This ticket is to decide if this change is worth it, how to do it, where (maybe not all servers require it), when and what blockers there are.