Creating this task for tracking issues:
db2057 replication got broken several times over night with deletes going to testwiki.echo_notification:
root@PRODUCTION s3[(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: db2018.codfw.wmnet Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: db2018-bin.003183 Read_Master_Log_Pos: 820062111 Relay_Log_File: db2057-relay-bin.000533 Relay_Log_Pos: 540151115 Relay_Master_Log_File: db2018-bin.003182 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1146 Last_Error: Error 'Table 'testwiki.echo_notification' doesn't exist' on query. Default database: 'testwiki'. Query: 'delete from echo_notification where notification_event =2293' Skip_Counter: 0 Exec_Master_Log_Pos: 540150827 Relay_Log_Space: 1868643849 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: Yes Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1146 Last_SQL_Error: Error 'Table 'testwiki.echo_notification' doesn't exist' on query. Default database: 'testwiki'. Query: 'delete from echo_notification where notification_event =2293' Replicate_Ignore_Server_Ids: Master_Server_Id: 180359174 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-171966669-4032579870,180359174-180359174-94123433,171966669-171966669-215805333
I skipped that query, and it broke again with the same query but for echo_event table this time, which also doesn't exist.
I checked the binlogs from the master and found (starting at around 21:30:22 UTC):
delete from echo_notification where notification_event=53922 delete from echo_event where event_id=53922 delete from echo_notification where notification_event =2293 delete from echo_event where event_id=2293
So what I have done is create those two tables on db2057 (they do exist on all the other hosts of s3).
Those lowercase delete look like they were done manually maybe? I have checked in SAL, gerrit and phabricator but couldn't find anything related to it