Page MenuHomePhabricator

Unexpected LoadBalancer::doWait() in load.php (ChronologyProtector not disabled)
Closed, ResolvedPublicPRODUCTION ERROR

Description

This is super rare (happened once) and literally the only log message written to Logstash, with any channel or severity, for load.php requests in 12 hours, but here goes:

timestamp=2018-04-19T16:56:32
type=mediawiki channel=DBReplication level=WARNING
shard=s1 server=en.wikipedia.org url=/w/load.php?...
message=

Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1066 pos 0-171970637-5476909411,180359172-180359172-49702203,171970637-171970637-1477964445

trace=

#0 /srv/mediawiki/php-1.31.0-wmf.29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(384): Wikimedia\Rdbms\LoadBalancer->doWait()
#1 /srv/mediawiki/php-1.31.0-wmf.29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(720): Wikimedia\Rdbms\LoadBalancer->getReaderIndex()
#2 /srv/mediawiki/php-1.31.0-wmf.29/includes/GlobalFunctions.php(2865): Wikimedia\Rdbms\LoadBalancer->getConnection()
#3 /srv/mediawiki/php-1.31.0-wmf.29/includes/resourceloader/ResourceLoader.php(123): wfGetDB()
#4 /srv/mediawiki/php-1.31.0-wmf.29/includes/resourceloader/ResourceLoader.php(751): ResourceLoader->preloadModuleInfo()
#5 /srv/mediawiki/php-1.31.0-wmf.29/load.php(53): ResourceLoader->respond()
#6 /srv/mediawiki/w/load.php(3): include()
#7 {main}

It's probably harmless but would like to make sure to understand if this has any impact, and whether it could potentially be avoided, and if not, whether RL should do something in return.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

From #wikimedia-perf (times in BST)

00:35 <•AaronSchulz> I assume the user made some change to something and then made a load.php request
[..]
00:45 <•Krinkle> That's a problem.
00:45 <•Krinkle> Can we disable CP for load.php?

Turns out, we already try to disabled it in load.php (src), but getChronologyProtector() is initialised early (since 4ccb228bde9294d96d) on by a call to baseLoadBalancerParams() and passed down, with no going back.

So the disabling of CP isn't working.

Confirmed by patching load.php as follows

mediawiki/load.php
MediaWikiServices::getInstance()->getDBLoadBalancerFactory()->setRequestInfo( [
	'ChronologyProtection' => 'false',
] );

// add:
var_dump(Wikimedia\TestingAccessWrapper::newFromObject(
	Wikimedia\TestingAccessWrapper::newFromObject(
		MediaWikiServices::getInstance()->getDBLoadBalancerFactory()
	)->getChronologyProtector()
)->enabled);exit;
output
/var/www/mediawiki/load.php:49:boolean true

So it's still enabled.

Change 427834 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/427834

Change 427834 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/427834

Change 428021 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428021

Krinkle renamed this task from Understand impact of LoadBalancer::doWait() warning in load.php code to Unexpected LoadBalancer::doWait() in load.php (ChronologyProtector not disabled) .Apr 23 2018, 3:46 PM
Krinkle raised the priority of this task from Low to High.
Krinkle moved this task from Untriaged to Rdbms library on the Wikimedia-Rdbms board.
Krinkle moved this task from Mar 2021 to Dec2019/1.35.wmf.10+ on the Wikimedia-production-error board.

Change 428021 merged by jenkins-bot:
[mediawiki/core@master] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428021

Change 428364 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@REL1_31] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428364

Change 428364 merged by jenkins-bot:
[mediawiki/core@REL1_31] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428364

Change 428555 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@REL1_31] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/428555

Change 428555 merged by jenkins-bot:
[mediawiki/core@REL1_31] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/428555

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM