On Oct 27 14:09:54 UTC, replication stopped for a data drift error:
Could not execute Update_rows_v1 event on table s54518__mw.online; Can't find record in 'online', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log log.221984, end_log_pos 27719835
I filtered the table and restarted replication, but it simply failed on another table. After investigation, I found that the replica was set to read-write mode, which could explain the drift, though we don't advertise the existence of that server. Someone could have set their system to write to it for some reason, but this would be surprising.
Overall, this has blocked failover (T263679) until we can fix or rebuild this server.
Coordinating etherpad here: https://etherpad.wikimedia.org/p/toolsdb-2020-replica-rebuild