Page MenuHomePhabricator

MessageGroupStats::forItemInternal deadlocks
Closed, ResolvedPublic1 Story Points

Description

Lots of these on wikidata:

mw1109 wikidatawiki MessageGroupStats::forItemInternal 10.64.32.28 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.32.28) INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('page-Wikidata:Oversight','en','28','27','0','0')


Version: unspecified
Severity: normal

Details

Reference
bz51410

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 2:00 AM
bzimport set Reference to bz51410.
bzimport added a subscriber: Unknown Object (MLST).
aaron created this task.Jul 15 2013, 11:12 PM

Any suggestions how to fix? PoolCounter?

There are also MessageGroupStats related locking issues on mediawiki.org, e.g.:

Mon Sep 9 18:24:16 UTC 2013 mw1194 mediawikiwiki MessageGroupStats::clear 10.64.16.8 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.8) DELETE FROM translate_groupstats WHERE tgs_group = 'page-Communication' AND tgs_lang = 'hu'

This may be a separate issue, but posting here for reference.

  • Bug 57374 has been marked as a duplicate of this bug. ***

Why is this suddenly high priority?

L10N Eng Dev team defines priorities themselves. Resetting.

aaron added a comment.Feb 26 2014, 6:47 AM

Dozens of servers trying to insert the same row on metawiki today:

INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('page-Terms of use/Paid contributions amendment','en','46','46','0','0')

(In reply to Niklas Laxström from comment #1)

Any suggestions how to fix? PoolCounter?

aaron added a comment.Jun 6 2014, 10:10 PM

Log snippet from today (truncated the middle since it was repetitive):

2014-06-06 20:22:22 mw1137 metawiki: Sub-optimal transaction on DB(s) 10.64.16.22 (metawiki) (000000003fecc80f00000000a293a52d):
0 30.377860 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
1 30.378035 DatabaseBase::query-master
2 0.000352 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
3 0.000396 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
4 0.000485 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
5 0.000379 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
6 0.000360 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
7 0.000310 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
8 0.000388 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
9 0.000373 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
10 0.000331 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
11 0.000379 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
12 0.000384 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
13 0.000325 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
...<snip>...
392 0.000357 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
393 0.000383 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
394 0.000424 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
395 0.000362 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
396 0.000470 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
397 0.000418 query-m: INSERT IGNORE INTO translate_groupstats (tgs_group,tgs_lang,tgs_total,tgs_translated,tgs_fuzzy,tgs_proofread) VALUES ('X')
398 0.002223 query-m: COMMIT

Are there some mass-updates triggered by user actions that could possibly using a job queue or something?

Probably someone viewing Special:LanguageStats or Special:MessageGroupStats. There is during-request processing with couple of second timeout after which it will give "not available" for the rest for that request. The actual insertion to DB could be delayed if that helps.

aaron added a comment.Jun 11 2014, 5:19 PM

It would probably help if the INSERT was:

a) As close to the COMMIT as possible (without much else interlaced)
b) Batched as one INSERT, or at least have the inserts in lexicographical order of (tgs_group, tgs_lang)

Just got that on Meta while trying to view [[m:Tech/News/2014/25]].

Change 148444 had a related patch set uploaded by Aaron Schulz:
Use GET_LOCK to try to reduce INSERT deadlocks

https://gerrit.wikimedia.org/r/148444

Change 148444 had a related patch set uploaded by Nikerabbit:
Use GET_LOCK to try to reduce INSERT deadlocks

https://gerrit.wikimedia.org/r/148444

Change 148444 merged by jenkins-bot:
Use GET_LOCK to try to reduce translate_groupstats deadlocks

https://gerrit.wikimedia.org/r/148444

  • Bug 56357 has been marked as a duplicate of this bug. ***

All patches mentioned in this report were merged - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?

Nikerabbit raised the priority of this task from Normal to High.Feb 1 2015, 4:33 PM

Still happening it seems.

Nemo_bis set Security to None.Feb 1 2015, 4:46 PM
Nemo_bis added a subscriber: Springle.
gerritbot added a subscriber: gerritbot.

Change 189319 had a related patch set uploaded (by Nikerabbit):
Avoid deadlocks in Special:(MessageGroup|Language)Stats

https://gerrit.wikimedia.org/r/189319

Patch-For-Review

Change 189319 merged by jenkins-bot:
Avoid deadlocks in Special:(MessageGroup|Language)Stats

https://gerrit.wikimedia.org/r/189319

Thanks! Can we assume this fixed?

Nikerabbit closed this task as Resolved.Feb 12 2015, 2:47 PM
KartikMistry moved this task from Backlog to Done on the LE-Sprint-82 board.Feb 17 2015, 3:55 AM
Nemo_bis added a comment.EditedFeb 21 2015, 7:54 AM

https://www.mediawiki.org/wiki/Help:Magic_words still fails, now with

Funzione: MessageGroupStats::clearGroup
Errore: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.27)
Pginer-WMF edited a custom field.Feb 24 2015, 2:21 PM

Change 192963 had a related patch set uploaded (by Aaron Schulz):
Made MessageGroupStatesUpdaterJob jobs be de-duplicated

https://gerrit.wikimedia.org/r/192963

Change 192963 merged by jenkins-bot:
Made MessageGroupStatesUpdaterJob jobs be de-duplicated

https://gerrit.wikimedia.org/r/192963