Page MenuHomePhabricator

Exceptions when database connection is lost and automatically reopened with pending transaction idle callbacks
Closed, ResolvedPublic

Description

While attempting to confirm bug 65251, a user performed rapid file uploads in an attempt to get two versions uploaded within the same second.

In dberror.log, I see numerous entries along the lines of the following that seem to be related:

Tue May 13 15:25:30 UTC 2014 mw1190 zhwiki Connection lost and reconnected after 33.725s, query: BEGIN /* DatabaseBase::query (Revision::fetchFromConds) Liangent-bot */

Which seems to be behind these exceptions that Ori noticed:

2014-05-13 15:51:25 mw1147 zhwiki: [2328423f] /w/api.php?maxlag=0 Exception from line 927 of /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php: Transaction idle callbacks still pending.
#0 /usr/local/apache/common-local/php-1.24wmf3/includes/db/DatabaseMysqlBase.php(66): DatabaseBase->close()
#1 /usr/local/apache/common-local/php-1.24wmf3/includes/db/DatabaseMysqlBase.php(574): DatabaseMysqlBase->open('10.64.16.13', [REDACTED], [REDACTED], 'zhwiki')
#2 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(1108): DatabaseMysqlBase->ping()
#3 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(1595): DatabaseBase->query('SELECT 1 FROM...', 'LocalFile::lock')
#4 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(1306): DatabaseBase->select('image', '1', Array, 'LocalFile::lock', Array)
#5 /usr/local/apache/common-local/php-1.24wmf3/includes/filerepo/file/LocalFile.php(1851): DatabaseBase->selectField('image', '1', Array, 'LocalFile::lock', Array)
#6 /usr/local/apache/common-local/php-1.24wmf3/includes/filerepo/file/LocalFile.php(1168): LocalFile->lock()
#7 /usr/local/apache/common-local/php-1.24wmf3/includes/upload/UploadBase.php(698): LocalFile->upload('/tmp/php5wruKn', 'UTFComment: oZV...', 'UTFComment: oZV...', 1, Array, false, Object(User))
#8 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiUpload.php(637): UploadBase->performUpload('UTFComment: oZV...', 'UTFComment: oZV...', false, Object(User))
#9 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiUpload.php(144): ApiUpload->performUpload(Array)
#10 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiUpload.php(111): ApiUpload->getContextResult()
#11 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiMain.php(913): ApiUpload->execute()
#12 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiMain.php(363): ApiMain->executeAction()
#13 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiMain.php(334): ApiMain->executeActionWithErrorHandling()
#14 /usr/local/apache/common-local/php-1.24wmf3/api.php(85): ApiMain->execute()
#15 /usr/local/apache/common-local/w/api.php(3): require('/usr/local/apac...')
#16 {main}

DatabaseBase->query() appears to have noticed the disconnection, and calls ->ping() to try to reconnect, which tries to reopen the connection, which tries to close the dead connection, which throws an exception due to the pending callbacks. What the correct behavior might be in this situation (if throwing an exception isn't itself right), I have no idea.

I note this seems to be an instance of the problem bug 47375 is generically asking to be fixed.


Version: 1.24rc
Severity: normal

Details

Reference
bz65263

Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz65263.
Anomie created this task.May 13 2014, 4:15 PM

On further review, I'm not sure whether the rapid file uploads are responsible for the database overload leading to the disconnection (although the reconnect-then-throw-exception behavior is still unpleasant). At the same time I see many exceptions like the following:

2014-05-13 16:16:49 mw1029 enwiki: [955fda5b] /w/index.php?title=Special:Export&action=submit Exception from line 1093 of /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php: DB connection was already closed.
#0 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(3492): DatabaseBase->query('COMMIT', 'DatabaseBase::c...')
#1 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(3477): DatabaseBase->doCommit('DatabaseBase::c...')
#2 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(937): DatabaseBase->commit('DatabaseBase::c...', 'flush')
#3 /usr/local/apache/common-local/php-1.24wmf3/includes/db/LoadBalancer.php(870): DatabaseBase->close()
#4 /usr/local/apache/common-local/php-1.24wmf3/includes/specials/SpecialExport.php(395): LoadBalancer->closeAll()
#5 /usr/local/apache/common-local/php-1.24wmf3/includes/specials/SpecialExport.php(179): SpecialExport->doExport('New York City??...', Array, false, false)
#6 /usr/local/apache/common-local/php-1.24wmf3/includes/specialpage/SpecialPage.php(379): SpecialExport->execute(NULL)
#7 /usr/local/apache/common-local/php-1.24wmf3/includes/specialpage/SpecialPageFactory.php(503): SpecialPage->run(NULL)
#8 /usr/local/apache/common-local/php-1.24wmf3/includes/Wiki.php(285): SpecialPageFactory::executePath(Object(Title), Object(RequestContext))
#9 /usr/local/apache/common-local/php-1.24wmf3/includes/Wiki.php(596): MediaWiki->performRequest()
#10 /usr/local/apache/common-local/php-1.24wmf3/includes/Wiki.php(448): MediaWiki->main()
#11 /usr/local/apache/common-local/php-1.24wmf3/index.php(46): MediaWiki->run()
#12 /usr/local/apache/common-local/w/index.php(3): require('/usr/local/apac...')
#13 {main}

combined with many "too many connections" errors in dberror.log; apparently someone was spamming Special:Export at the same time.

Note that enwiki (where Special:Export took place) lives in s1 while zhwiki (where I uploaded files) lives in s2.

Change 133173 had a related patch set uploaded by Aaron Schulz:
Cleaned up database reconnection logic

https://gerrit.wikimedia.org/r/133173

Change 133173 merged by jenkins-bot:
Cleaned up database reconnection logic

https://gerrit.wikimedia.org/r/133173

gerrit change 133173 merged. I assume it has been fixed.