Page MenuHomePhabricator

MySQL "Too many connections" error due to infinite loop in SQLBagOStuff when using the DB as primary cache
Closed, ResolvedPublicBUG REPORT

Description

Note: even though I'm reporting this bug, I can't reproduce it locally as I use SQLite and a different setup. @cmelo can reproduce this and provide more information. It might also be related to CentralAuth or the wiki farm config.

Steps to replicate the issue (include links if applicable):

  • Make sure you have a local wiki running on MySQL
  • Set $wgMainCacheType to CACHE_ANYTHING
  • Open localhost:8080/w/index.php (or equivalent) in your browser. Note, this bug can only reproduced by hitting index.php directly.

What happens?:
You get the following error:

Sorry! This site is experiencing technical difficulties.
Try waiting a few minutes and reloading.

(Cannot access the database: Cannot access the database: Too many connections (wikifarm-db))

Backtrace:

#0 /var/www/html/metawiki/includes/libs/rdbms/loadbalancer/LoadBalancer.php(779): Wikimedia\Rdbms\LoadBalancer->reportConnectionError()
#1 /var/www/html/metawiki/includes/libs/rdbms/loadbalancer/LoadBalancer.php(767): Wikimedia\Rdbms\LoadBalancer->getServerConnection()
#2 /var/www/html/metawiki/includes/libs/rdbms/database/DBConnRef.php(103): Wikimedia\Rdbms\LoadBalancer->getConnectionInternal()
#3 /var/www/html/metawiki/includes/libs/rdbms/database/DBConnRef.php(117): Wikimedia\Rdbms\DBConnRef->ensureConnection()
#4 /var/www/html/metawiki/includes/libs/rdbms/database/DBConnRef.php(369): Wikimedia\Rdbms\DBConnRef->__call()
#5 /var/www/html/metawiki/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(772): Wikimedia\Rdbms\DBConnRef->selectRow()
#6 /var/www/html/metawiki/includes/page/PageStore.php(201): Wikimedia\Rdbms\SelectQueryBuilder->fetchRow()
#7 /var/www/html/metawiki/includes/cache/LinkCache.php(451): MediaWiki\Page\PageStore->MediaWiki\Page\{closure}()
#8 /var/www/html/metawiki/includes/cache/LinkCache.php(485): MediaWiki\Cache\LinkCache->getGoodLinkRowInternal()
#9 /var/www/html/metawiki/includes/page/PageStore.php(190): MediaWiki\Cache\LinkCache->getGoodLinkRow()
#10 /var/www/html/metawiki/includes/page/PageStore.php(156): MediaWiki\Page\PageStore->getPageByNameViaLinkCache()
#11 /var/www/html/metawiki/includes/page/PageStore.php(328): MediaWiki\Page\PageStore->getPageByName()
#12 /var/www/html/metawiki/includes/title/Title.php(3804): MediaWiki\Page\PageStore->getPageByReference()
#13 /var/www/html/metawiki/includes/title/Title.php(1076): MediaWiki\Title\Title->getFieldFromPageStore()
#14 /var/www/html/metawiki/includes/title/Title.php(3659): MediaWiki\Title\Title->getContentModel()
#15 /var/www/html/metawiki/includes/Output/OutputPage.php(2717): MediaWiki\Title\Title->getPageLanguage()
#16 /var/www/html/metawiki/includes/Output/OutputPage.php(2811): MediaWiki\Output\OutputPage->addAcceptLanguage()
#17 /var/www/html/metawiki/includes/Output/OutputPage.php(2920): MediaWiki\Output\OutputPage->sendCacheControl()
#18 /var/www/html/metawiki/includes/actions/ActionEntryPoint.php(162): MediaWiki\Output\OutputPage->output()
#19 /var/www/html/metawiki/includes/MediaWikiEntryPoint.php(199): MediaWiki\Actions\ActionEntryPoint->execute()
#20 /var/www/html/metawiki/index.php(58): MediaWiki\MediaWikiEntryPoint->run()
#21 {main}

What should have happened instead?:
It should load normally (and redirect you to the main page).

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia): master

Other information (browser name/version, screenshots, etc.):
A 2500-lines stacktrace gets logged in the debug log for the attempted queries. It looks like it's trying to load the User object from cache (User::loadFromCache), which eventually reaches SqlBagOStuff, which tries to obtain a DB connection. It then calls LoadBalancer::isPrimaryRunningReadOnly() to see if we're read-only. But isPrimaryRunningReadOnly also uses WANCache to store this value, so it gets to SqlBagOStuff again, it opens a new connection, it checks whether it's read-only, which opens a new connection, and so on until we run out of available connections. Here are the last 100 lines from that stack trace:

[TBD]

We ran a git bisect, and it pointed to r955771.

Might be somewhat related to T163492, although that's a different problem.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
cmelo updated the task description. (Show Details)
DAlangi_WMF subscribed.

Let me look into this and see if I can resolve the issue. I confirm that I can reproduce the issue when I visit the /w/index.php page since I have a MySQL/MariaDB setup locally.

Change #1020183 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] objectcache: Inject various services into OCF

https://gerrit.wikimedia.org/r/1020183

DAlangi_WMF changed the task status from Open to In Progress.Apr 17 2024, 2:27 PM
DAlangi_WMF triaged this task as High priority.

@Daimona, I was already working on a patch on the road to resolve these kinds of DB related issues and I included the solution to this issue in the patch. Let me know if the issue gets resolved for you locally with this patch, then you can review and do the honors :)

@Daimona, I was already working on a patch on the road to resolve these kinds of DB related issues and I included the solution to this issue in the patch. Let me know if the issue gets resolved for you locally with this patch, then you can review and do the honors :)

Hi @DAlangi_WMF, thank you, I can confirm with this patch it works for me.

Change #1020183 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Inject DBLoadBalancerFactory into ObjectCacheFactory

https://gerrit.wikimedia.org/r/1020183

Change #1022520 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@REL1_42] objectcache: Inject DBLoadBalancerFactory into ObjectCacheFactory

https://gerrit.wikimedia.org/r/1022520

@Daimona / @cmelo, since this issue exists in REL1_42, I've made a cherry-pick to that branch so as to resolve the issue in that release. Let me know if that makes sense and if we should move forward with it.

DAlangi_WMF lowered the priority of this task from High to Low.Apr 22 2024, 10:41 AM

@Daimona / @cmelo, since this issue exists in REL1_42, I've made a cherry-pick to that branch so as to resolve the issue in that release. Let me know if that makes sense and if we should move forward with it.

I think it would make sense. The bug is maybe not critical (you need CACHE_ANYTHING and to hit index.php directly), but still I'd say it's definitely more than an annoyance.

Change #1022520 merged by jenkins-bot:

[mediawiki/core@REL1_42] objectcache: Inject DBLoadBalancerFactory into ObjectCacheFactory

https://gerrit.wikimedia.org/r/1022520