Page MenuHomePhabricator

Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set)
Closed, ResolvedPublic

Description

After latest translatewiki.net code update: this started appearing in the logs frequently:

[28-Aug-2019 12:49:49 UTC] PHP Fatal error:  Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set) in /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/database/Database.php:888
Stack trace:
#0 /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1506): Wikimedia\Rdbms\Database->close()
#1 /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/loadbalancer/LoadBalancer.php(2049): Wikimedia\Rdbms\LoadBalancer->Wikimedia\Rdbms\{closure}(Object(Wikimedia\Rdbms\DatabaseMysqli))
#2 /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1507): Wikimedia\Rdbms\LoadBalancer->forEachOpenConnection(Object(Closure))
#3 /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1496): Wikimedia\Rdbms\LoadBalancer->closeAll()
#4 /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/lbfactory/LBFactory.php(229): Wikimedia\Rdbms\LoadBalancer->disable()
#5 /srv/ in /srv/mediawiki/tags/2019-08-28_12:42:55/includes/libs/rdbms/database/Database.php on line 888

T217819: Wikimedia\\Rdbms\\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set) looks very similar.

At least the error message should be such that there is something to help debugging this.

Details

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 28 2019, 12:54 PM

With following debugging:

					if ( $this->writesOrCallbacksPending() ) {
						error_log( json_encode( [
							'MAOE',
							$this->trxLevel(),
							$this->trxDoneWrites,
							$this->trxIdleCallbacks,
							$this->trxPreCommitCallbacks,
							$this->trxEndCallbacks,
							$this->trxSectionCancelCallbacks
						] ) );
						$exception = new DBUnexpectedError(
							$this,
							__METHOD__ .
							": mass commit/rollback of peer transaction required (DBO_TRX set)"
						);
					}

I get:

[28-Aug-2019 15:51:03 UTC] ["MAOE",1,true,[],[],[[{},"ResourceLoaderModule::saveFileDependencies",null]],[]]

Unfortunately that doesn't help me at all to find out what could be the cause. Tips welcome!
Also wondering why the last line of the stack trace is so weird and there isn't more callers at the beginning.

Trying to correlate with access logs:

/w/load.php?lang=be-tarask&modules=ext.echo.styles.badge%7Cext.translate.statstable%2Ctabgroup%7Cext.uls.pt%7Cjquery.makeCollapsible.styles%7Cmediawiki.helplink%7Cmediawiki.htmlform.ooui.styles%7Cmediawiki.htmlform.styles%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.widgets.styles%7Coojs-ui-core.icons%2Cstyles%7Coojs-ui.styles.icons-alerts%2Cindicators%7Cskins.vector.styles&only=styles&skin=vector

/w/load.php?lang=hu&modules=ext.echo.styles.badge%7Cext.uls.pt%7Cmediawiki.diff.styles%7Cmediawiki.interface.helpers.styles%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.externallinks%7Coojs-ui.styles.icons-alerts%7Cskins.timeless&only=styles&skin=timeless

Both were HTTP 2.0 499 responses at the same time as the exception happened.

Were there other warnings or exceptions logged by the same HTTP requests? I suspect this may be a cascading issue due to DB idle callback coming back to life after another exception has left the DB in an uncommitted state.

Can you post the complete stacktrace? I'd be curious where the LoadBalancer->disable call originates. If it is from a __destruct then it might be related to T225103 or T230065.

aaron added a comment.Aug 29 2019, 4:04 AM

I wonder if some entry point lacks proper shutdown.

Change 533132 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: expand on LoadBalancer ownership concept

https://gerrit.wikimedia.org/r/533132

aaron added a comment.Aug 29 2019, 5:49 AM

Client disconnects (HTTP 499) are interesting...before the ignore_user_abort() in doPostOutputShutdown(), I suppose it's possible to end up with stuff like this (and long has been). https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519741/ would help this particular case by avoiding DB writes.

Perhaps the first write could trigger the ignore call...or better yet, connection_aborted() could be checked in these destructors to adjust the exception/logging behavior.

Change 533373 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: add more ScopedCallback::newScopedIgnoreUserAbort() calls

https://gerrit.wikimedia.org/r/533373

aaron added a comment.Aug 30 2019, 5:36 AM

It looks like WebStart.php sets ignore_user_abort() for POSTS and the major entry points have wfTransactionalTimeLimit() set for POSTS. In the case of module_deps updates for load.php, that's on GET.

Change 533373 merged by jenkins-bot:
[mediawiki/core@master] rdbms: add more ScopedCallback::newScopedIgnoreUserAbort() calls

https://gerrit.wikimedia.org/r/533373

Will report back after deployment tomorrow Wednesday.

I'm not sure if the merged patch was supposed to fix this, but I am still seeing the error with slightly changed line numbers:

[04-Sep-2019 12:54:55 UTC] PHP Fatal error:  Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set) in /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/database/Database.php:888
Stack trace:
#0 /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1517): Wikimedia\Rdbms\Database->close()
#1 /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(2085): Wikimedia\Rdbms\LoadBalancer->Wikimedia\Rdbms\{closure}(Object(Wikimedia\Rdbms\DatabaseMysqli))
#2 /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1518): Wikimedia\Rdbms\LoadBalancer->forEachOpenConnection(Object(Closure))
#3 /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1502): Wikimedia\Rdbms\LoadBalancer->closeAll()
#4 /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/lbfactory/LBFactory.php(235): Wikimedia\Rdbms\LoadBalancer->disable()
#5 /srv/ in /srv/mediawiki/tags/2019-09-04_12:36:29/includes/libs/rdbms/database/Database.php on line 888

Change 533132 merged by jenkins-bot:
[mediawiki/core@master] rdbms: expand on LoadBalancer ownership concept

https://gerrit.wikimedia.org/r/533132

Jpita added a subscriber: Jpita.Sep 10 2019, 5:23 AM

@Nikerabbit can we move this to done?

Pginer-WMF closed this task as Resolved.Sep 10 2019, 8:50 AM
Pginer-WMF claimed this task.