Page MenuHomePhabricator

SELECT /* CategoryMembershipChangeJob::run 127.0.0.1 */ GET_LOCK('CategoryMembershipUpdates:XXXX', 10) AS lockstatus
Closed, ResolvedPublicPRODUCTION ERROR

Description

fatalmonitor shows a lot of messages such as:

[10000ms] at runtime/ext_mysql: slow query: SELECT /* CategoryMembershipChangeJob::run 127.0.0.1 */ GET_LOCK('CategoryMembershipUpdates:#######', 10) AS lock status

No clue what they are though :(

Event Timeline

hashar created this task.Apr 27 2016, 7:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 27 2016, 7:13 PM

a lot

How many, link?

aude added a subscriber: aude.EditedApr 29 2016, 6:03 PM

this is 600 of the last 1000 hhvm.log entries (~42/min)

jcrespo added a comment.EditedMay 2 2016, 9:22 AM

While the bug may be valid, this will happen every time there is lag on a slave- so it is a consequence, not a cause. Without a real cause of the lag, this is just T95501 , unless the locking mediawiki model is changed. See for example, T109943

Restricted Application added a project: archived--TCB-Team. · View Herald TranscriptSep 13 2016, 4:42 PM

Would it make sense to not log these as slow querys?
But perhaps something else?

Our Puppet manifest for HHMV has:

slow_query_threshold => to_milliseconds('10s')

And MediaWiki CategoryMembershipChangeJob::run has a lock set to a hardcoded 10, so that triggers the SlowTimer notification.

So raising one or the other would hide the message.


@jcrespo For the list of messages, I have looked in https://logstash.wikimedia.org/ and search for "GET_LOCK('CategoryMembershipUpdates". Today I we had a few hour longs events with 500-800 such messages per minutes

@hashar yesterday we had a crashed slave failover, given the limitation of jobs of continue hitting the same server (remember we discussed this limitation on an unrelated ticket) I wouldn't be surprised with jobs having issues.

Change 310514 had a related patch set uploaded (by Aaron Schulz):
Reduce CategoryMembershipChangeJob lock timeout

https://gerrit.wikimedia.org/r/310514

Change 310514 merged by jenkins-bot:
Reduce CategoryMembershipChangeJob lock timeout

https://gerrit.wikimedia.org/r/310514

https://gerrit.wikimedia.org/r/310514 changes the GET_LOCK() from 10 to 3 seconds. No idea about the impact for the db/job, but that will surely stop the SlowTimer notification.

Addshore moved this task from Incoming to Other on the archived--TCB-Team board.Sep 19 2016, 8:34 AM
hashar closed this task as Resolved.Sep 26 2016, 3:30 PM
hashar assigned this task to aaron.

Fixed by https://gerrit.wikimedia.org/r/#/c/310514/ which reduce the lock timeout to 3 seconds. Deployed with MW-1.28-release (WMF-deploy-2016-09-20_(1.28.0-wmf.20))

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM