fatalmonitor shows a lot of messages such as:
[10000ms] at runtime/ext_mysql: slow query: SELECT /* CategoryMembershipChangeJob::run 127.0.0.1 */ GET_LOCK('CategoryMembershipUpdates:#######', 10) AS lock status
No clue what they are though :(
fatalmonitor shows a lot of messages such as:
[10000ms] at runtime/ext_mysql: slow query: SELECT /* CategoryMembershipChangeJob::run 127.0.0.1 */ GET_LOCK('CategoryMembershipUpdates:#######', 10) AS lock status
No clue what they are though :(
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Reduce CategoryMembershipChangeJob lock timeout | mediawiki/core | master | +1 -1 |
Our Puppet manifest for HHMV has:
slow_query_threshold => to_milliseconds('10s')
And MediaWiki CategoryMembershipChangeJob::run has a lock set to a hardcoded 10, so that triggers the SlowTimer notification.
So raising one or the other would hide the message.
@jcrespo For the list of messages, I have looked in https://logstash.wikimedia.org/ and search for "GET_LOCK('CategoryMembershipUpdates". Today I we had a few hour longs events with 500-800 such messages per minutes
@hashar yesterday we had a crashed slave failover, given the limitation of jobs of continue hitting the same server (remember we discussed this limitation on an unrelated ticket) I wouldn't be surprised with jobs having issues.
Change 310514 had a related patch set uploaded (by Aaron Schulz):
Reduce CategoryMembershipChangeJob lock timeout
https://gerrit.wikimedia.org/r/310514 changes the GET_LOCK() from 10 to 3 seconds. No idea about the impact for the db/job, but that will surely stop the SlowTimer notification.
Fixed by https://gerrit.wikimedia.org/r/#/c/310514/ which reduce the lock timeout to 3 seconds. Deployed with MW-1.28-release (WMF-deploy-2016-09-20_(1.28.0-wmf.20))