Page MenuHomePhabricator

Clarify Database::isReadOnly() and simplify LoadBalancer logic for caching read-only mode
Closed, ResolvedPublic

Description

The quasi-duplication and interaction between isPrimaryRunningReadOnly() and isPrimaryConnectionReadOnly() is convoluted. They should be merged into one method, using the same cache scheme.

Ideally, @@GLOBAL.read_only would only be queried 0 to 1 times per LoadBalancer per request, where any refresh queries triggered by getServerConnection() only happen for new connections rather than reused connections. This would avoid some occasional errors when the connection was lost in the meantime and silent re-connection does not apply due to things like Database::getScopedLock(). An example error is:

exception.previous.message
A connection error occurred during a query. 
Query: SELECT @@GLOBAL.read_only AS Value
Function: Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly
Error: 2006 MySQL server has gone away
	
exception.previous.trace
from /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(1524)
#0 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(1145): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(991): Wikimedia\Rdbms\Database->attemptQuery(array, array, string, string, boolean, boolean)
#2 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(845): Wikimedia\Rdbms\Database->executeQuery(string, string, integer, string)
#3 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/DatabaseMysqlBase.php(419): Wikimedia\Rdbms\Database->query(string, string, integer)
#4 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1906): Wikimedia\Rdbms\DatabaseMysqlBase->serverIsReadOnly()
#5 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/objectcache/BagOStuff.php(216): Wikimedia\Rdbms\LoadBalancer::Wikimedia\Rdbms\{closure}(integer)
#6 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1912): BagOStuff->getWithSetCallback(string, integer, Closure)
#7 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(906): Wikimedia\Rdbms\LoadBalancer->isPrimaryConnectionReadOnly(Wikimedia\Rdbms\DatabaseMysqli, integer)
#8 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(861): Wikimedia\Rdbms\LoadBalancer->getServerConnection(integer, string, integer)
#9 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/DBConnRef.php(103): Wikimedia\Rdbms\LoadBalancer->getConnectionInternal(integer, array, string, integer)
#10 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/DBConnRef.php(117): Wikimedia\Rdbms\DBConnRef->ensureConnection()
#11 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/DBConnRef.php(331): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#12 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(669): Wikimedia\Rdbms\DBConnRef->selectField(array, string, array, string, array, array)
#13 /srv/mediawiki/php-1.41.0-wmf.4/extensions/DiscussionTools/includes/ThreadItemStore.php(376): Wikimedia\Rdbms\SelectQueryBuilder->fetchField()
#14 /srv/mediawiki/php-1.41.0-wmf.4/extensions/DiscussionTools/includes/Hooks/DataUpdatesHooks.php(48): MediaWiki\Extension\DiscussionTools\ThreadItemStore->insertThreadItems(MediaWiki\Revision\RevisionStoreRecord, MediaWiki\Extension\DiscussionTools\ContentThreadItemSet)
#15 /srv/mediawiki/php-1.41.0-wmf.4/includes/deferred/MWCallableUpdate.php(38): MediaWiki\Extension\DiscussionTools\Hooks\DataUpdatesHooks->MediaWiki\Extension\DiscussionTools\Hooks\{closure}()
#16 /srv/mediawiki/php-1.41.0-wmf.4/includes/deferred/DeferredUpdates.php(473): MWCallableUpdate->doUpdate()
#17 /srv/mediawiki/php-1.41.0-wmf.4/includes/deferred/RefreshSecondaryDataUpdate.php(103): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#18 /srv/mediawiki/php-1.41.0-wmf.4/includes/deferred/DeferredUpdates.php(473): RefreshSecondaryDataUpdate->doUpdate()
#19 /srv/mediawiki/php-1.41.0-wmf.4/includes/Storage/DerivedPageDataUpdater.php(1836): DeferredUpdates::attemptUpdate(RefreshSecondaryDataUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#20 /srv/mediawiki/php-1.41.0-wmf.4/includes/page/WikiPage.php(2145): MediaWiki\Storage\DerivedPageDataUpdater->doSecondaryDataUpdates(array)
#21 /srv/mediawiki/php-1.41.0-wmf.4/includes/jobqueue/jobs/RefreshLinksJob.php(244): WikiPage->doSecondaryDataUpdates(array)
#22 /srv/mediawiki/php-1.41.0-wmf.4/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(MediaWiki\Title\Title)
#23 /srv/mediawiki/php-1.41.0-wmf.4/extensions/EventBus/includes/JobExecutor.php(79): RefreshLinksJob->run()
#24 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#25 {main}
	
exception.trace
from /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/TransactionManager.php(225)
#0 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(1293): Wikimedia\Rdbms\TransactionManager->assertSessionStatus(Wikimedia\Rdbms\DatabaseMysqli, string)
#1 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(841): Wikimedia\Rdbms\Database->assertQueryIsCurrentlyAllowed(string, string)
#2 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(1631): Wikimedia\Rdbms\Database->query(string, string, integer)
#3 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/Database.php(1640): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#4 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\Rdbms\Database->selectRow(array, array, array, string, array, array)
#5 /srv/mediawiki/php-1.41.0-wmf.4/includes/libs/rdbms/database/DBConnRef.php(362): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#6 /srv/mediawiki/php-1.41.0-wmf.4/includes/Revision/RevisionStore.php(2403): Wikimedia\Rdbms\DBConnRef->selectRow(array, array, array, string, array, array)
#7 /srv/mediawiki/php-1.41.0-wmf.4/includes/Revision/RevisionStore.php(2347): MediaWiki\Revision\RevisionStore->fetchRevisionRowFromConds(Wikimedia\Rdbms\DBConnRef, array, integer, array)
#8 /srv/mediawiki/php-1.41.0-wmf.4/includes/Revision/RevisionStore.php(1291): MediaWiki\Revision\RevisionStore->loadRevisionFromConds(Wikimedia\Rdbms\DBConnRef, array, integer, WikiPage)
#9 /srv/mediawiki/php-1.41.0-wmf.4/includes/actions/InfoAction.php(193): MediaWiki\Revision\RevisionStore->getRevisionByTitle(WikiPage, integer, integer)
#10 /srv/mediawiki/php-1.41.0-wmf.4/includes/jobqueue/jobs/RefreshLinksJob.php(245): InfoAction::invalidateCache(WikiPage)
#11 /srv/mediawiki/php-1.41.0-wmf.4/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(MediaWiki\Title\Title)
#12 /srv/mediawiki/php-1.41.0-wmf.4/extensions/EventBus/includes/JobExecutor.php(79): RefreshLinksJob->run()
#13 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#14 {main}

Event Timeline

Change 909720 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: refactor the passing of database read-only mode

https://gerrit.wikimedia.org/r/909720

aaron triaged this task as Medium priority.Apr 24 2023, 6:16 PM
aaron moved this task from Inbox, needs triage to Doing: Prio Interrupt on the Performance-Team board.

Change 909720 merged by jenkins-bot:

[mediawiki/core@master] rdbms: refactor the passing of database read-only mode

https://gerrit.wikimedia.org/r/909720

With the above change (6c21a8ef1147064365bffddcf4a698a44cf68fa7) I'm getting hundreds of connections being opened followed by a failure to connect ("Cannot access the database") due to too many connections:

[2023-05-02T05:15:18.743300+00:00] rdbms.DEBUG: Wikimedia\Rdbms\DatabaseMysqlBase::open [0s] localhost: SET group_concat_max_len = 262144, `sql_mode` = '' {"db_server":"localhost","db_name":"mediawiki_wiki1_41","db_user":"dev","method":"Wikimedia\\Rdbms\\DatabaseMysqlBase::open","sql":"SET group_concat_max_len = 262144, `sql_mode` = ''","domain":"mediawiki_wiki1_41","runtime":0.0,"db_log_category":"query"} {"host":"memex2","wiki":"mediawiki_wiki1_41","mwversion":"1.41.0-alpha","reqId":"2eced206de6b4a03c26c33da"}
[2023-05-02T05:15:18.743347+00:00] rdbms.DEBUG: Wikimedia\Rdbms\LoadBalancer::reallyOpenConnection: opened new connection for 0/mediawiki_wiki1_41 [] {"host":"memex2","wiki":"mediawiki_wiki1_41","mwversion":"1.41.0-alpha","reqId":"2eced206de6b4a03c26c33da"}
[2023-05-02T05:15:18.744411+00:00] rdbms.ERROR: Error connecting to localhost as user dev: :real_connect(): (08004/1040): Too many connections {"db_server":"localhost","db_name":"mediawiki_wiki1_41","db_user":"dev","error":":real_connect(): (08004/1040): Too many connections","exception":"[object] (RuntimeException(code: 0):  at mediawiki/includes/libs/rdbms/database/Database.php:1534)","db_log_category":"connection"} {"host":"memex2","wiki":"mediawiki_wiki1_41","mwversion":"1.41.0-alpha","reqId":"2eced206de6b4a03c26c33da"}
[2023-05-02T05:15:18.744987+00:00] rdbms.WARNING: Wikimedia\Rdbms\LoadBalancer::reallyOpenConnection: connection error for 0/mediawiki_wiki1_41 {"db_domain":"mediawiki_wiki1_41"} {"host":"memex2","wiki":"mediawiki_wiki1_41","mwversion":"1.41.0-alpha","reqId":"2eced206de6b4a03c26c33da"}
[2023-05-02T05:15:18.745116+00:00] rdbms.WARNING: Wikimedia\Rdbms\LoadBalancer::reuseOrOpenConnectionForNewRef: connection error for 0/mediawiki_wiki1_41 [] {"host":"memex2","wiki":"mediawiki_wiki1_41","mwversion":"1.41.0-alpha","reqId":"2eced206de6b4a03c26c33da"}

Everything's working correctly with the previous commit, 0351dc85aaebffc7ac3b1a79ca5182c72182bd48.

With the above change (6c21a8ef1147064365bffddcf4a698a44cf68fa7) I'm getting hundreds of connections being opened followed by a failure to connect ("Cannot access the database") due to too many connections:

Hundreds per web request or per second or something else?

The immediate thing that stands out is that the patch assumes a non-empty wgMainCacheType. It removes php-apcu layer, which I believe is a mistake in retrospect.

No, a single web request (resulting in "Sorry! This site is experiencing technical difficulties.") is making 151 database connections (which is the value of max_connections) and then failing.

It works with $wgReadOnly = 'anything', and also with $wgMainCacheType = CACHE_ACCEL and CACHE_NONE, but not CACHE_ANYTHING (which is resolving to CACHE_DB).

Maybe some kind of recursion due to using CACHE_DB, though it would have only been avoided before by accident of reuseOrOpenConnectionForNewRef() returning and the tracked ->conns array being updated before getReadOnlyReason() calls.

Change 914425 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: avoid infinite recursion with CACHE_DB in LoadBalancer

https://gerrit.wikimedia.org/r/914425

Change 914455 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Fix infinite recursion in DBLoadBalancerFactoryConfigBuilder service

https://gerrit.wikimedia.org/r/914455

Change 914455 merged by jenkins-bot:

[mediawiki/core@master] Fix infinite recursion in DBLoadBalancerFactoryConfigBuilder service

https://gerrit.wikimedia.org/r/914455

Change 914887 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] rdbms: remove unused ILoadBalancer::CONN_REFRESH_READ_ONLY constant

https://gerrit.wikimedia.org/r/914887

Change 914425 abandoned by Aaron Schulz:

[mediawiki/core@master] rdbms: avoid infinite recursion with CACHE_DB in LoadBalancer

Reason:

https://gerrit.wikimedia.org/r/914425

Change 914887 merged by jenkins-bot:

[mediawiki/core@master] rdbms: remove unused ILoadBalancer::CONN_REFRESH_READ_ONLY constant

https://gerrit.wikimedia.org/r/914887