Page MenuHomePhabricator

updateSpecialPages.php will try to re-connect to a DB indefinitely
Closed, ResolvedPublic

Description

As seen in here: https://gerrit.wikimedia.org/g/mediawiki/core/+/4bc5734399d240534af95139cefced7df760c5e2/maintenance/updateSpecialPages.php#138

	private function reopenAndWaitForReplicas() {
		$lbFactory = $this->getServiceContainer()->getDBLoadBalancerFactory();
		$lb = $lbFactory->getMainLB();
		if ( !$lb->pingAll() ) {
			$this->output( "\n" );
			do {
				$this->error( "Connection failed, reconnecting in 10 seconds..." );
				sleep( 10 );
			} while ( !$lb->pingAll() );
			$this->output( "Reconnected\n\n" );
		}
		// Wait for the replica DB to catch up
		$this->waitForReplication();
	}

This will get stuck if the DB it was previously using gets taken down for maintenance or something similar.

Event Timeline

Change 964455 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] updateSpecialPages: Call ::waitForReplication() if a host isn't reachable

https://gerrit.wikimedia.org/r/964455

I would just delete the whole thing, it breaks many assumptions (caller shouldn't know about replicas and their health), it pings every replica (why?) but it's not easily doable. For now I make a tiny change to make sure this doesn't happen again.

Change 964455 merged by jenkins-bot:

[mediawiki/core@master] updateSpecialPages: Call ::waitForReplication() if a host isn't reachable

https://gerrit.wikimedia.org/r/964455

So currently if a replica is dead, it'll try to connect indefinitely until someone depools the host. Is that a good enough solution for this ticket? If so, please close it.

taavi assigned this task to Ladsgroup.

That sounds fine.