Page MenuHomePhabricator

Unexpected LoadBalancer::doWait() in load.php (ChronologyProtector not disabled)
Closed, ResolvedPublicPRODUCTION ERROR

Description

This is super rare (happened once) and literally the only log message written to Logstash, with any channel or severity, for load.php requests in 12 hours, but here goes:

timestamp=2018-04-19T16:56:32
type=mediawiki channel=DBReplication level=WARNING
shard=s1 server=en.wikipedia.org url=/w/load.php?...
message=

Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1066 pos 0-171970637-5476909411,180359172-180359172-49702203,171970637-171970637-1477964445

trace=

#0 /srv/mediawiki/php-1.31.0-wmf.29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(384): Wikimedia\Rdbms\LoadBalancer->doWait()
#1 /srv/mediawiki/php-1.31.0-wmf.29/includes/libs/rdbms/loadbalancer/LoadBalancer.php(720): Wikimedia\Rdbms\LoadBalancer->getReaderIndex()
#2 /srv/mediawiki/php-1.31.0-wmf.29/includes/GlobalFunctions.php(2865): Wikimedia\Rdbms\LoadBalancer->getConnection()
#3 /srv/mediawiki/php-1.31.0-wmf.29/includes/resourceloader/ResourceLoader.php(123): wfGetDB()
#4 /srv/mediawiki/php-1.31.0-wmf.29/includes/resourceloader/ResourceLoader.php(751): ResourceLoader->preloadModuleInfo()
#5 /srv/mediawiki/php-1.31.0-wmf.29/load.php(53): ResourceLoader->respond()
#6 /srv/mediawiki/w/load.php(3): include()
#7 {main}

It's probably harmless but would like to make sure to understand if this has any impact, and whether it could potentially be avoided, and if not, whether RL should do something in return.

Event Timeline

Krinkle created this task.Apr 19 2018, 10:55 PM
Restricted Application added a project: Performance-Team. · View Herald TranscriptApr 19 2018, 10:55 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle triaged this task as Low priority.Apr 19 2018, 10:57 PM
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Untriaged to Debug-only on the Wikimedia-production-error board.
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)Apr 19 2018, 11:00 PM
Krinkle added a comment.EditedApr 20 2018, 12:08 AM

From #wikimedia-perf (times in BST)

00:35 <•AaronSchulz> I assume the user made some change to something and then made a load.php request
[..]
00:45 <•Krinkle> That's a problem.
00:45 <•Krinkle> Can we disable CP for load.php?

Turns out, we already try to disabled it in load.php (src), but getChronologyProtector() is initialised early (since 4ccb228bde9294d96d) on by a call to baseLoadBalancerParams() and passed down, with no going back.

So the disabling of CP isn't working.

Confirmed by patching load.php as follows

mediawiki/load.php
MediaWikiServices::getInstance()->getDBLoadBalancerFactory()->setRequestInfo( [
	'ChronologyProtection' => 'false',
] );

// add:
var_dump(Wikimedia\TestingAccessWrapper::newFromObject(
	Wikimedia\TestingAccessWrapper::newFromObject(
		MediaWikiServices::getInstance()->getDBLoadBalancerFactory()
	)->getChronologyProtector()
)->enabled);exit;
output
/var/www/mediawiki/load.php:49:boolean true

So it's still enabled.

Change 427834 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/427834

Change 427834 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/427834

Change 428021 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428021

Krinkle renamed this task from Understand impact of LoadBalancer::doWait() warning in load.php code to Unexpected LoadBalancer::doWait() in load.php (ChronologyProtector not disabled) .Apr 23 2018, 3:46 PM
Krinkle raised the priority of this task from Low to High.
Krinkle moved this task from Untriaged to Rdbms library on the Wikimedia-Rdbms board.

Change 428021 merged by jenkins-bot:
[mediawiki/core@master] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428021

Change 428364 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@REL1_31] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428364

Change 428364 merged by jenkins-bot:
[mediawiki/core@REL1_31] rdbms: make sure cpPosIndex cookie is applied to LBFactory in time

https://gerrit.wikimedia.org/r/428364

Imarlier moved this task from Inbox to Doing on the Performance-Team board.Apr 23 2018, 8:18 PM
Krinkle assigned this task to aaron.Apr 23 2018, 8:19 PM

Change 428555 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@REL1_31] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/428555

Change 428555 merged by jenkins-bot:
[mediawiki/core@REL1_31] resourceloader: Disable ChronologyProtector on load.php (again)

https://gerrit.wikimedia.org/r/428555

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM