Page MenuHomePhabricator

Unexpected master connection on GET request (from Block->newLoad)
Closed, ResolvedPublic

Description

Error

Request ID: XEuGSwpAAC4AABoDHBcAAAAF
Request URL: https://www.mediawiki.org/wiki/NotFound:Example

Expectation (masterConns <= 0) by MediaWiki::main not met (actual: 1):
[connect to 10.64.32.136 (mediawikiwiki)]
#0 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/TransactionProfiler.php(186): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated()
#1 /srv/mediawiki/php-1.33.0-wmf.14/includes/libs/rdbms/loadbalancer/LoadBalancer.php(761): Wikimedia\Rdbms\TransactionProfiler->recordConnection()
#2 /srv/mediawiki/php-1.33.0-wmf.14/includes/GlobalFunctions.php(2654): Wikimedia\Rdbms\LoadBalancer->getConnection()
#3 /srv/mediawiki/php-1.33.0-wmf.14/includes/Block.php(325): wfGetDB()
#4 /srv/mediawiki/php-1.33.0-wmf.14/includes/Block.php(1292): Block->newLoad()
#5 /srv/mediawiki/php-1.33.0-wmf.14/includes/user/User.php(1863): Block::newFromTarget()
#6 /srv/mediawiki/php-1.33.0-wmf.14/includes/user/User.php(2290): User->getBlockedStatus()
#7 /srv/mediawiki/php-1.33.0-wmf.14/includes/Title.php(2742): User->getBlock()
#8 /srv/mediawiki/php-1.33.0-wmf.14/includes/Title.php(2938): Title->checkUserBlock()
#9 /srv/mediawiki/php-1.33.0-wmf.14/includes/Title.php(2189): Title->getUserPermissionsErrorsInternal()
#10 /srv/mediawiki/php-1.33.0-wmf.14/includes/skins/Skin.php(722): Title->userCan()
#11 /srv/mediawiki/php-1.33.0-wmf.14/includes/skins/SkinTemplate.php(298): Skin->getUndeleteLink()
#12 /srv/mediawiki/php-1.33.0-wmf.14/includes/skins/SkinTemplate.php(225): SkinTemplate->prepareQuickTemplate()
#13 /srv/mediawiki/php-1.33.0-wmf.14/includes/OutputPage.php(2714): SkinTemplate->outputPage()
#14 /srv/mediawiki/php-1.33.0-wmf.14/includes/MediaWiki.php(869): OutputPage->output()
#15 /srv/mediawiki/php-1.33.0-wmf.14/includes/MediaWiki.php(881): Closure$MediaWiki::main()
#16 /srv/mediawiki/php-1.33.0-wmf.14/includes/MediaWiki.php(517): MediaWiki->main()
#17 /srv/mediawiki/php-1.33.0-wmf.14/index.php(42): MediaWiki->run()
#18 /srv/mediawiki/w/index.php(3): include()
#19 {main}

Event Timeline

Krinkle created this task.Jan 25 2019, 9:58 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 25 2019, 9:58 PM
Restricted Application added a subscriber: MGChecker. · View Herald TranscriptJan 25 2019, 9:58 PM

The call to User::getBlock from Title::checkUserBlock was introduced in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/478056/

It replaced a call to User::isBlockedFrom (which calls User::getBlock), but that was only performed if the action was 'edit' or 'create'.

The use of the master database is nothing new, but the unconditional call to User::getBlock regardless of the action is. Maybe we should check if the action requires unblocking before we get the block.

Krinkle added a comment.EditedJan 29 2019, 7:09 PM

[..] It replaced a call to User::isBlockedFrom (which calls User::getBlock), but that was only performed if the action was 'edit' or 'create'.
The use of the master database is nothing new, but the unconditional call to User::getBlock regardless of the action is. [..]

This code was quite messy, but.... the 'edit' and 'create' conditions were (I believe) serving as a proxy for a different condition that more directly relates to the DBPerformance warning.

The DBPerformance instrumentation is configured differently depending on the type of request. In POST requests for form submissions we allow establishing master connections, as well as for a small subset of GET requests (e.g. viewing the edit page), which we know to be rare (compared to all page views) and/or functionally requires it for correctness. In this case, the 'edit' and 'create' actions were effectively a proxy for those two conditions.

Establishing connections the master DB from a GET request is generally problematic as it can significantly increase the load or congest network for this one server (compared to replicas, of which we have many).

We're also getting closer to having a working multi-DC setup. Once deployed, we would route POST requests and this subset of GET urls to the primary data center. However, the bulk of the traffic (GET page views) gets routed to the user's nearest secondary data center. There, a master connection would also experience a higher latency as the master DB resides in the primary DC.

Krinkle added a comment.EditedJan 29 2019, 7:17 PM

Looking more closely, I wrongly assumed that userCan('edit') would only be called when on the edit page or submitting an edit. It is also used for the much more common check when deciding to show the Edit button on every page (instead of a "View source" button, which you'd see if the wiki has restricted editing to a user group one isn't a member of).

The thing is though, for this very common purpose of rendering things like the "Undelete" link and "Edit" link, we generally use the rigor=false flag instead of rigor='secure', which is meant to skip these kinds of checks and/or use the replica DB instead.

It appears code for this exists in Title::checkUserBlock:

		$useReplica = ( $rigor !== 'secure' );
		$block = $user->getBlock( $useReplica );

That suggests to me that either this method is wrongly called with rigor=secure on a regular page view, or perhaps $useReplica is not (entirely) honoured.

It is indeed being called with the default $rigor='secure' from Skin::getUndeleteLink: https://gerrit.wikimedia.org/g/mediawiki/core/+/refs/changes/80/484780/1/includes/skins/Skin.php#722 (That patch made this problem happen less often, by changing the order in which conditions are checked.)

If we generally use $rigor=false for UI, we should be able to call Title::quickUserCan here instead.

Change 487294 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/core@master] Avoid making master connection from Skin::getUndeleteLink

https://gerrit.wikimedia.org/r/487294

Change 487294 merged by jenkins-bot:
[mediawiki/core@master] Avoid making master connection from Skin::getUndeleteLink

https://gerrit.wikimedia.org/r/487294

dbarratt closed this task as Resolved.Jan 31 2019, 2:05 AM
dbarratt moved this task from Review to Done on the Anti-Harassment (Bet — ב) board.
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM