Page MenuHomePhabricator

"Lock wait timeout exceeded" errors regarding module_deps
Closed, DuplicatePublicPRODUCTION ERROR

Description

Lately (.10 still) there has been lots of self-inflicted locks while trying to execute the same query from several application servers: https://logstash.wikimedia.org/#dashboard/temp/AVJ4kOSCptxhN1XaJd7F

{
  "_index": "logstash-2016.01.25",
  "_type": "mediawiki",
  "_id": "AVJ4hI7sMRv_gmyx6mr7",
  "_score": null,
  "_source": {
    "message": "DatabaseMysqlBase::replace\t10.64.32.22\t1205\tLock wait timeout exceeded; try restarting transaction (10.64.32.22)\tREPLACE INTO `module_deps` (md_module,md_skin,md_deps) VALUES ('ext.wikimediaBadges','vector|en','[\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-golden-star.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-problematic.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-proofread.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-silver-star.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-validated.png\\\"]')",
    "@version": 1,
    "@timestamp": "2016-01-25T11:21:38.000Z",
    "type": "mediawiki",
    "host": "mw1073",
    "level": "ERROR",
    "tags": [
      "syslog",
      "es",
      "es",
      "normalized_message_trimmed"
    ],
    "channel": "wfLogDBError",
    "url": "/w/load.php?debug=false&lang=en&modules=ext.cite.styles%7Cext.gadget.DRN-wizard%2CReferenceTooltips%2CWatchlistBase%2CWatchlistGreenIndicators%2Ccharinsert%2Cfeatured-articles-links%2CrefToolbar%2Cswitcher%2Cteahouse%7Cext.math.styles%7Cext.pygments%2CwikimediaBadges%7Cext.tmh.thumbnail.styles%7Cext.uls.nojs%7Cext.visualEditor.desktopArticleTarget.noscript%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.page.gallery.styles%7Cmediawiki.raggett%2CsectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Cwikibase.client.init&only=styles&skin=vector",
    "ip": "10.64.32.106",
    "http_method": "GET",
    "server": "en.wikipedia.org",
    "referrer": "https://en.wikipedia.org/wiki/Perceptron",
    "uid": "a4c98f4",
    "process_id": 8237,
    "wiki": "enwiki",
    "db_server": "10.64.32.22",
    "db_name": "enwiki",
    "db_user": "wikiuser",
    "method": "DatabaseBase::reportQueryError",
    "errno": 1205,
    "error": "Lock wait timeout exceeded; try restarting transaction (10.64.32.22)",
    "sql1line": "REPLACE INTO `module_deps` (md_module,md_skin,md_deps) VALUES ('ext.wikimediaBadges','vector|en','[\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-golden-star.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-problematic.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-proofread.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-silver-star.png\\\",\\\"extensions/Wikidata/extensions/WikimediaBadges/resources/skins/../images/badge-validated.png\\\"]')",
    "fname": "DatabaseMysqlBase::replace",
    "normalized_message": "DatabaseMysqlBase::replace\t10.64.32.22\t1205\tLock wait timeout exceeded; try restarting transaction (10.64.32.22)\tREPLACE INTO `module_deps` (md_module,md_skin,md_deps) VALUES ('ext.wikimediaBadges','vector|en','[\\\"extensions/Wikidata/extensions/WikimediaB"
  },
  "sort": [
    1453720898000
  ]
}

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added a subscriber: jcrespo.

This is already causing replication lag on s6.

 Hits 	Tmax 	Tavg 	Tsum 	Hosts 	                                                Users 	        Schemas 
491	16	6	3,321	db1023, db1024, db1033, db1038, db1040, db1052, db1058	wikiuser	arwiki, azwiki, commonswiki, dewiki, enwiki, enwikisource, eswiki, fiwiki, frwiki, hewiki, itwiki, jawiki, lvwiki, plwiki, rowiki, ruwiki, thwiki, ukwiki, zhwiki

Adding @aaron and @Krinkle per the recommendation on Scrum of Scrums.

Change 266987 had a related patch set uploaded (by Aaron Schulz):
Reduce module_deps write slams on deployments

https://gerrit.wikimedia.org/r/266987

Change 266987 merged by jenkins-bot:
resourceloader: Reduce module_deps write slams after deployments

https://gerrit.wikimedia.org/r/266987

Aaron's patch got merged - can this problem still be seen? If not, do we assume this got fixed and this task should be changed to 'resolved' status?

jcrespo claimed this task.

Last case (that is not labswiki) was a single occurrence on 2016-03-10T17:59:26.000Z

I would say it is resolved.

Krinkle set Security to None.
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM