Tried to discourage translation of https://meta.wikimedia.org/wiki/Abuse_filter_editors
[XWcXowpAICsAAL5HABMAAACQ] 2019-08-29 00:09:24: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"
Tried again, and it worked
DannyS712 | |
Aug 29 2019, 12:11 AM |
F30387932: image.png | |
Sep 18 2019, 7:01 AM |
Tried to discourage translation of https://meta.wikimedia.org/wiki/Abuse_filter_editors
[XWcXowpAICsAAL5HABMAAACQ] 2019-08-29 00:09:24: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"
Tried again, and it worked
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Defer message group stats updating when (dis|en)couraging pages | mediawiki/extensions/Translate | master | +42 -8 |
Query: REPLACE INTO `translate_groupreviews` (tgr_group,tgr_lang,tgr_state) VALUES ('page-Abuse filter editors','*priority','discouraged') Function: MessageGroups::setPriority Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.15)
At the same time:
Sub-optimal transaction on DB(s) [10.64.48.15 (metawiki) (TRX#5b688c)]: 0 0.000550 query-m: REPLACE INTO `translate_groupreviews` (tgr_group,tgr_lang,tgr_state) VALUES ('X') [TRX#5b688c] 1 0.000658 query-m: INSERT INTO `logging` (log_type,log_action,log_timestamp,log_namespace,log_title,log_page,log_params,log_comment_id,log_actor) VALUES ('X') [TRX#5b688c] 2 0.324470 query-m: SELECT page_id,page_title,page_namespace,page_latest,MAX(rt_revision) AS rt_revision,rt_type FROM `page`,`revtag` WHERE (page_id=rt_page) AND rt_type IN ('X') GROUP BY page_id, rt_type ORDER BY page_namespace, page_title [TRX#5b688c] 3 61.251727 ...delay...
@Nikerabbit what is the IP address in your comment above? If that is mine, please redact it
The code is roughly:
MessageGroups::getPriority( $id ); MessageGroups::setPriority( $id, $new ); // First query new ManualLogEntry( 'pagetranslation', $action )->publish(); // Second query $this->listPages() // Third query MessageGroupStats::clearGroup( $shared );
$this->listPages() shouldn't take much time in any condition. Hence I believe it is the stats clearing that keeps the transaction open. My plan is to defer it to the job queue. It won't explain what was holding locks on the table, but it should at least avoid affecting web requests. Clearing the stats doesn't need to happen right away, and it isn't such a big deal if it fails. The stats usually self-heal over time due to other changes.
Change 533234 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] Defer message group stats updating when (dis|en)couraging pages
@abi_ This seems quite similar to the issue with the message index, just a different table.
Change 533234 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Defer message group stats updating when (dis|en)couraging pages
Last instance of this "Lock wait timeout exceeded" was at 2019-08-29T00:09:24. Recently there has only been one sub-optimal transaction:
Sub-optimal transaction on DB(s) [10.64.48.15 (metawiki) (TRX#57aa48)]: 0 3.636585 query-m: REPLACE INTO `translate_groupreviews` (tgr_group,tgr_lang,tgr_state) VALUES ('X') [TRX#57aa48] 1 0.000441 query-m: INSERT INTO `logging` (log_type,log_action,log_timestamp,log_namespace,log_title,log_page,log_params,log_comment_id,log_actor) VALUES ('X') [TRX#57aa48] 2 2.121573 ...delay...
The events are very sparse though, so it's hard to say whether this is fully fixed. Below plot of the error over time:
@Nikerabbit from your previous comment everything seems to be ok, can I move this to Done?
Seems okay for now. Maybe those updates could be wrapper in their own transaction to avoid keeping it open longer than necessary. But it might not help if there is lock contention (I see no reason why simple replace into would take seconds).