Page MenuHomePhabricator

MediaWiki Resource Loader, upsert(), and PostgreSQL: deadlock issues
Closed, ResolvedPublicBUG REPORT

Description

Description

References: T167942 and cc0473766a53. I was advised to report this as a bug.

Every now and then (700+ occurrences over a single day with a moderately busy instance) PostgreSQL registers a deadlock, always with queries related to the module_deps table:

Apr 27 18:00:00 db newsyslog[60636]: logfile turned over due to size>1000K
Apr 27 18:00:00 db postgres[60412]: [8-1] azurlane_wiki ERROR:  deadlock detected
Apr 27 18:00:00 db postgres[60412]: [8-2] azurlane_wiki DETAIL:  Process 60412 waits for ShareLock on transaction 31624948; blocked by process 60420.
Apr 27 18:00:00 db postgres[60412]: [8-3]       Process 60420 waits for ShareLock on transaction 31624945; blocked by process 60412.
Apr 27 18:00:00 db postgres[60412]: [8-4]       Process 60412: UPDATE /* Wikimedia\Rdbms\Database::upsert  */  "module_deps" SET md_deps = '["resources/src/jquery/jquery.makeCollapsible.styles.less"]' WHERE ((md_module = 'jquery.makeCollapsible.styles') OR (md_skin = 'vector|en-gb'))
Apr 27 18:00:00 db postgres[60412]: [8-5]       Process 60420: UPDATE /* Wikimedia\Rdbms\Database::upsert  */  "module_deps" SET md_deps = '["extensions/MobileFrontend/mobile.less/mobile.mixins.less","extensions/MobileFrontend/mobile.less/mobile.variables.less","extensions/MobileFrontend/resources/mobile.startup/LoadingOverlay.less","extensions/MobileFrontend/resources/mobile.startup/Overlay.less","extensions/MobileFrontend/resources/mobile.startup/drawer.less","extensions/MobileFrontend/resources/mobile.startup/icon.less","extensions/MobileFrontend/resources/mobile.startup/panel.less","resources/src/mediawiki.less/mediawiki.mixins.animation.less","resources/src/mediawiki.less/mediawiki.mixins.less","resources/src/mediawiki.less/mediawiki.ui/variables.less","skins/MinervaNeue/minerva.less/minerva.mixins.less","skins/MinervaNeue/minerva.less/minerva.variables.less","skins/MinervaNeue/skinStyles/mobile.startup/Overlay.less","skins/MinervaNeue/skinStyles/mobile.startup/images/error.svg","skins/MinervaNeue/skinStyles/mobile.startup/skin.less","skins/MinervaNeue/skinStyles/m
Apr 27 18:00:00 db postgres[60412]: [8-8] azurlane_wiki HINT:  See server log for query details.
Apr 27 18:00:00 db postgres[60412]: [8-9] azurlane_wiki CONTEXT:  while updating tuple (213,16) in relation "module_deps"
Apr 27 18:00:00 db postgres[60412]: [8-10] azurlane_wiki STATEMENT:  UPDATE /* Wikimedia\Rdbms\Database::upsert  */  "module_deps" SET md_deps = '["resources/src/jquery/jquery.makeCollapsible.styles.less"]' WHERE ((md_module = 'jquery.makeCollapsible.styles') OR (md_skin = 'vector|en-gb'))

This leads me to think there is some faulty logic in the implementation. Race conditions are unavoidable as there are multiple threads running, but deadlocks, in my opinion, should not happen like this.

Steps to Reproduce:

Run MediaWiki 1.30 with the PostgreSQL upsert() patch, or 1.32.0 stable.

Actual Results:

Abundance of deadlock errors in the PostgreSQL error log.

Expected Results:

No deadlock errors in the PostgreSQL error log.

Event Timeline

IijimaYun created this task.May 2 2019, 5:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2019, 5:39 PM

Update: this seems to trigger DB server overload. Deadlock detection has timeouts and this leads to PostgreSQL reaching its connection limit (my max_connections is set to 1024 and it gets exhausted during massive user influx).
These hanging connections seem to lead to ResourceLoader actually failing to load CSS and JavaScript.

db:~ # head -2 /var/log/messages
May 24 04:00:00 db newsyslog[94073]: logfile turned over due to size>1000K
May 24 04:00:01 db postgres[93462]: [12-1] azurlane_wiki ERROR:  deadlock detected

db:~ # grep "FATAL:  remaining" /var/log/messages | tail -1
May 24 04:08:08 db postgres[12409]: [7-1] azurlane_wiki FATAL:  remaining connection slots are reserved for non-replication superuser connections

db:~ # grep "FATAL:  remaining" /var/log/messages | wc -l
    8748

As you can see, 8748 requests were rejected in just 8 minutes due to 1024 connections already being used by MediaWiki (and MediaWiki alone); this is with less than 20 concurrent users (judging by HTTP/2 session count on the HTTP load balancer).

Restricted Application added a project: Performance-Team. · View Herald TranscriptMay 23 2019, 7:27 PM
Reedy added a subscriber: Anomie.May 23 2019, 10:33 PM
Reedy added a subscriber: Reedy.May 23 2019, 11:01 PM

MySQL is doing (using dummy values)...

INSERT INTO `mw_module_deps` (md_module,md_skin,md_deps) VALUES ('skin name','vary','deps') ON DUPLICATE KEY UPDATE md_deps = 'deps'

pg is doing the following and then an insert...

UPDATE /* Wikimedia\Rdbms\Database::upsert  */  "module_deps" SET md_deps = '["resources/src/jquery/jquery.makeCollapsible.styles.less"]' WHERE ((md_module = 'jquery.makeCollapsible.styles') OR (md_skin = 'vector|en-gb'))

Change 512301 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Make md_module, md_skin a single unique index in ResourceLoaderModule::saveFileDependencies()

https://gerrit.wikimedia.org/r/512301

[11:17:22] <Remilia> Reedy: can confirm the patch totally fixes the deadlock issue, zero deadlock errors since I added it on the main wiki (11+ hours now) while the other, not patched yet, wikis get them (once a day since they are very low use)

Thanks for the report @IijimaYun. I believe this bug was caused by e7b57d881a, first released in MW 1.29.

Once confirmed by @aaron, I'll backport to all supported and affected versions (1.31 and 1.32).

Krinkle triaged this task as High priority.May 24 2019, 1:28 PM

Change 512301 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512301

Change 512380 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@REL1_33] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512380

Change 512395 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@REL1_32] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512395

Change 512396 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@REL1_31] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512396

Change 512397 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@REL1_30] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512397

Change 512397 merged by Reedy:
[mediawiki/core@REL1_30] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512397

Change 512395 merged by jenkins-bot:
[mediawiki/core@REL1_32] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512395

Change 512396 merged by jenkins-bot:
[mediawiki/core@REL1_31] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512396

Reedy closed this task as Resolved.May 24 2019, 5:20 PM
Reedy claimed this task.
Reedy removed a project: Patch-For-Review.

Change 512380 merged by jenkins-bot:
[mediawiki/core@REL1_33] resourceloader: Use AND instead of OR for upsert conds in saveFileDependencies()

https://gerrit.wikimedia.org/r/512380