------------------------ LATEST DETECTED DEADLOCK ------------------------ 2017-08-09 09:13:13 7f742a17f700 *** (1) TRANSACTION: TRANSACTION 78008195605, ACTIVE 41 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 24 lock struct(s), heap size 2936, 24 row lock(s), undo log entries 23 MySQL thread id 165027767, OS thread handle 0x7f71e2ea8700, query id 16558091672 executing *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 723 page no 729779 n bits 200 index `PRIMARY` of table `enwiki`.`page` trx id 78008195605 lock_mode X locks rec *** (2) TRANSACTION: TRANSACTION 78008145646, ACTIVE 104 sec starting index read mysql tables in use 1, locked 1 117655 lock struct(s), heap size 12105256, 5973 row lock(s) MySQL thread id 165151888, OS thread handle 0x7f742a17f700, query id 16558035348 10.68.18.18 s51290 Sending data DELETE FROM ch_pl WHERE NOT EXISTS ( SELECT 1 FROM enwiki_p.page WHERE pl_from = page_id AND page_namespace = 0 AND page_is_redirect = 0 ) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 723 page no 729779 n bits 200 index `PRIMARY` of table `enwiki`.`page` trx id 78008145646 lock mode S locks rec *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 723 page no 12786 n bits 208 index `PRIMARY` of table `enwiki`.`page` trx id 78008145646 lock mode S locks rec *** WE ROLL BACK TRANSACTION (1) `
These are the main tools involved:
- Xtools:
INSERT HIGH_PRIORITY IGNORE INTO `s51187__xtools_tmp`.`20170809125445e77252c210e090f34d0d1308 SELECT page_namespace , a.rev_page , a.rev_timestamp , a.rev_minor_edit , IFNULL(a.rev_parent_id,0) , IFNULL(a.rev_len,0) , CONCAT( LEFT(SUBSTRING_INDEX( a.rev_comment, '*/', -1),20), RIGHT(a.rev_comment,20) FROM revision_userindex a JOIN page ON page_id = a.rev_page WHERE a.rev_user = '7181920' AND a.rev_timestamp > 20080526
- dplbot (I was told):
CREATE TABLE n_dab_pl ( pl_from int unsigned NOT NULL default '0', pl_title varchar(255) binary NOT NULL default '' ) ENGINE=MyISAM AS SELECT pl_from, oc_title AS pl_title FROM enwiki_p.pagelinks, n_dab_link_candidates WHERE pl_namespace = 0 AND pl_title = oc_title
Because they are using MyISAM (a heavy locking engine, unsupported and discouraged on replica hosts), combined with DDL/DMLs (INSERT/UPDATE/DELETE/CREATE ... SELECT) they are causing replication delay for All users. This has been happening for a while, but normally it only caused small delays, they have now increased to several hours of delay (maybe because the interaction between the 2?).
The easiest way to solve this is to create/convert the user tables with the InnoDB engine (ENGINE=InnoDB), which doesn't require locking the original replicated table. Alternatively, a SELECT can be done separated from the insertion to avoid blocking writes. Please correct this issue as soon if possible. As documented:
Caution: Writing (INSERT, UPDATE, DELETE) tables on the replica servers leads to the replication being stopped for all users on that server until the query finishes. So you need to make sure that such queries can be executed in a time frame that does not disrupt other users' work too much, for example by processing data in smaller batches or rethinking your data flow. As a rule of thumb, queries should finish in less than a minute. -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Steps_to_create_a_user_database_on_the_replica_servers
Because it impacts so badly to other users, I will either convert the tables or block these queries action is not taken reasonably soon (or we will get other users' complains of lagging).