Page MenuHomePhabricator

MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state
Open, LowPublic

Description

The following used to work without warnings:

nice php extensions/Translate/scripts/ttmserver-export.php --threads=8
25593   0.20  18.0M  Forked thread 25594 to handle bootstrapping
25594   0.31  20.0M  Cleaning up old entries...
25594   4.77  20.0M  Waiting for the index to go green...
25594   4.93  20.0M     Green!
25593   5.90  44.0M  Forked thread 25595 to handle wiki-betawiki
25593   5.91  44.0M  Forked thread 25596 to handle page-Translating:Process
25593   5.91  44.0M  Forked thread 25597 to handle page-Translating:Statistics
25593   5.91  44.0M  Forked thread 25598 to handle page-Translating:How to start
25593   5.92  44.0M  Forked thread 25599 to handle page-User:FuzzyBot
25593   5.92  44.0M  Forked thread 25600 to handle page-Project:General disclaimer
25593   5.92  44.0M  Forked thread 25601 to handle page-Project:Privacy policy
25593   5.92  44.0M  Forked thread 25602 to handle page-Translating:MediaWiki/Statistics in time
PHP Warning:  Packets out of order. Expected 1 received 190. Packet size=14744671 in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): MySQL server has gone away in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): Error reading result set's header in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  Packets out of order. Expected 1 received 255. Packet size=3124963 in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): MySQL server has gone away in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): Error reading result set's header in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
25593  11.71  44.0M  Forked thread 25603 to handle page-Translating:Offline
PHP Warning:  Packets out of order. Expected 1 received 190. Packet size=14744671 in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): MySQL server has gone away in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): Error reading result set's header in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
25593  12.31  44.0M  Forked thread 25604 to handle page-Project list
PHP Warning:  Packets out of order. Expected 1 received 255. Packet size=3124963 in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): MySQL server has gone away in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
PHP Warning:  mysqli::query(): Error reading result set's header in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
25593  15.16  44.0M  Forked thread 25605 to handle page-Translating:Intro
PHP Warning:  Error while sending QUERY packet. PID=25605 in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.php on line 46
25593  15.96  44.0M  Forked thread 25606 to handle page-Translating:New project
PHP Warning:  Error while sending QUERY packet. PID=25606 in /www/dev.translatewiki.net/docroot/w/includes/libs/rdbms/database/DatabaseMysqli.

Tested on latest master and noticed on master ~1 week old.

Event Timeline

If this also happens on WMF production, this will prevent reindexing of translation memories – and cause a translation memory outage if someone starts an reindex without knowing about this bug.

This prevents testing translation memory code as testing instances cannot be bootstrapped.

I guess one workaround is to edit the code (when possible!) to avoid forking and run on a single thread – making it much slower and potentially run out of memory.

LBFactory does not implement DestructibleService, though it has a destroy() method. This is due to it being in /libs. It relies on reference counting, where the old service container instance falls out of scope in resetGlobalInstance() with $oldInstance dying, then LBFactory following suite and triggering __destruct()=>destroy(), and so on. If something has a ServiceContainer instance (with LBFactory loaded) pre-fork and tries to use it later it will get ContainerDisabledException.

If some code is holding onto an LBFactory instance pre-fork and tries to use it later then that code will run into these problems. If LBFactory::destory() was called then it would error out because of LoadBalancer::disable calls instead, but destroy() is not called.

I don't see anything in that script holding onto LBFactory like that however.

Why doesn't that script use ForkController btw?

Correct usage of ForkController (which has logic that ttmserver-export is mostly doing) works fine.

require_once __DIR__ . '/Maintenance.php';

class TestForkController extends Maintenance {
	public function __construct() {
		parent::__construct();
		$this->addDescription( 'Test ForkController' );
		$this->addOption( 'procs', 'Number of processes to use', true, true );
	}

	public function execute() {
		$procs = intval( $this->getOption( 'procs' ) );

		$dbr = wfGetDB( DB_REPLICA );
		$lbfactory = \MediaWiki\MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
		$fc = new ForkController( $procs );
		if ( $fc->start() != 'child' ) {
			exit( 0 );
		}
		var_dump( wfGetDB( DB_REPLICA )->getServerUptime() );
		# Error:
		#var_dump( $dbr->getServerUptime() );
		#var_dump( $lbfactory->getMainLB()->getConnection( DB_REPLICA )->getServerUptime() );
	}
}

$maintClass = TestForkController::class;
require_once RUN_MAINTENANCE_IF_MAIN;

The commented-out code is the problematic case I mentioned above.

Why doesn't that script use ForkController btw?

I suppose I was not aware of that in 2012 when the script was created.

LBFactory does not implement DestructibleService, though it has a destroy() method.

LBFactory used to implement DestructibleService, see https://gerrit.wikimedia.org/r/c/286314/. Was that changed in order to move LBFactory to /libs? Supporting fork was indeed a driving factor behind introducing service resets, and resetting LBFactory was the primary need.

LBFactory could again implement DestructibleService, if we moved includes/services to libs as well. I don't see anything there that depends on MediaWiki. All it would take would be to change the namespace from MediaWiki\Services to Wikimedia\Services (or perhaps more appropriately, Wikimedia\ServiceContainer).

...in addition to moving ServiceContainer and friends to libs, we could look into making it PSR-11 compatible, see T193722: Investigate making MediaWiki\ServiceContainer compatible with PSR-11

The PSR-11 spec is extremely minimal (literally just 1 interface with 2 methods: has and get). In particular, it doesn't specify a common interface for services, nor any concept of destructible services. So from lib-rdbms' perspective, there's nothing to do regarding PSR-11.

Unless there's an existing minimal interface package for "destructible services", we'll just need to roll our own. Google search for "destructible services php interface" actually brings up doc.wikimedia.org/mediawiki-core/master/php/DestructibleService as first result.

Anyway, back to the issue at hand, it's unclear to me how this task relates to destructible services. The issue is code querying an IDatabase object that has apparently closed its connection. Is that right?

If so, then the fact that lib-rdms doesn't implement DestructibleService should be working in our favour, not against it. Right? I mean, if we somehow have an old IDatabase object, then DestructibleService would proactively destroy it . Whereas right now, it's allowed to stay alive as long as something has a reference to it (given lib-rdbms self-destroys from __destruct)

LBFactory does not implement DestructibleService, though it has a destroy() method.

LBFactory used to implement DestructibleService, see https://gerrit.wikimedia.org/r/c/286314/. Was that changed in order to move LBFactory to /libs? Supporting fork was indeed a driving factor behind introducing service resets, and resetting LBFactory was the primary need.

LBFactory could again implement DestructibleService, if we moved includes/services to libs as well. I don't see anything there that depends on MediaWiki. All it would take would be to change the namespace from MediaWiki\Services to Wikimedia\Services (or perhaps more appropriately, Wikimedia\ServiceContainer).

Something like that sounds reasonable.

Let me paraprase WP:IAR: if stuff being in libs prevents you from writing awesome code, move it out.

I am not sure what has changed, but the script seems to work again. I'll file a follow-up for T193008#4175769 anyway.

Does anyone still want to keep this task open for things discussed above, or should I close this?

Vvjjkkii renamed this task from MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state to z9daaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
AfroThundr3007730 renamed this task from z9daaaaaaa to MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state.Jul 1 2018, 6:20 AM
AfroThundr3007730 raised the priority of this task from High to Needs Triage.
AfroThundr3007730 updated the task description. (Show Details)
AfroThundr3007730 added a subscriber: Aklapper.