Page MenuHomePhabricator

Make MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections
Open, Stalled, NormalPublic

Description

As proposed in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/534642/ , concurrency could be improved and connection overhead largely reduced. In testing around via shell.php, hitting 1K enwiki URLs goes from 17.5s for each of two patches in a row, down to 1.5 - 3.5s for each. Also netstat only shows 50 TCP connections rather than 2000 in ESTABLISHED/TIME_WAIT.

Test.php file included from shell.php to expose $benchmark().

$reqs = array_map(
	function ( $word ) {
		return [
			'method' => 'HEAD',
			'url' => "https://en.wikipedia.org/wiki/" . ucfirst( trim( $word ) )
		];
	},
	file( "$IP/words.list" ) // 1000 word list
);

$http = new MultiHttpClient( [] );
$benchmark = function () use ( $http, $reqs ) {
	for ( $i = 1; $i <= 2; ++$i ) {
		$start = microtime( true );
		$reqs = $http->runMulti( $reqs );
		$real = microtime( true ) - $start;

		$codes = [];
		foreach ( $reqs as $req ) {
			$codes[] = $req['response']['code'];
		}

		echo "$real sec\n";
		var_dump( array_count_values( $codes ) );
	}
};

Related Objects

StatusAssignedTask
StalledNone
OpenNone
OpenNone
ResolvedMoritzMuehlenhoff
ResolvedMoritzMuehlenhoff
ResolvedMoritzMuehlenhoff
ResolvedMoritzMuehlenhoff
ResolvedNone
ResolvedQuiddity
ResolvedLadsgroup
ResolvedJoe
ResolvedLegoktm
ResolvedLegoktm
Resolvedhashar
Resolvedhashar
Resolvedssastry
ResolvedSmalyshev
ResolvedLegoktm
Resolvedtstarling
Resolvedtstarling
Resolvedtstarling
Resolvedtstarling
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedDzahn
ResolvedRobH
ResolvedCmjohnson
ResolvedMoritzMuehlenhoff
ResolvedPapaul
ResolvedSmalyshev
Resolvedjcrespo
ResolvedJdforrester-WMF
ResolvedNone
ResolvedDzahn
Resolvedaaron
ResolvedJoe
ResolvedJoe
ResolvedKrinkle
ResolvedBPirkle
ResolvedJoe
ResolvedJoe
ResolvedGilles
ResolvedJoe
ResolvedAnomie
ResolvedAnomie
ResolvedKrinkle
Resolvedjijiki
ResolvedTgr
Resolvedjijiki
ResolvedMoritzMuehlenhoff
ResolvedArielGlenn
DuplicateNone
ResolvedReedy
ResolvedJoe
ResolvedKrinkle
Resolvedthcipriani
ResolvedDzahn
OpenNone
Resolvedjijiki
ResolvedNone
ResolvedNone
Invalidjijiki

Event Timeline

aaron created this task.Thu, Sep 5, 5:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptThu, Sep 5, 5:41 PM

Change 534642 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Improve MultiHttpClient connection concurrency and reuse

https://gerrit.wikimedia.org/r/534642

mobrovac renamed this task from Max MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections to Make MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections.Tue, Sep 10, 10:59 AM
mobrovac triaged this task as Normal priority.

Change 534642 merged by jenkins-bot:
[mediawiki/core@master] Improve MultiHttpClient connection concurrency and reuse

https://gerrit.wikimedia.org/r/534642

hashar added a subscriber: hashar.Tue, Sep 10, 3:08 PM

https://gerrit.wikimedia.org/r/534642 broke the train due to one of the new curl option not being recognized (I guess by HHVM).

mobrovac changed the task status from Open to Stalled.Tue, Sep 10, 3:10 PM
mobrovac added a subscriber: mobrovac.

The patch had to be reverted because CURLMOPT_MAX_HOST_CONNECTIONS was introduced only in PHP 7, so there is no support for it in HHVM. We'll need to revisit this idea once we are PHP7+ only, as I don't see an obvious way of circumventing this problem and making the patch still improve things for the time being.

Change 535753 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Use rolling handles in MultiHttpClient and allow connection reuse

https://gerrit.wikimedia.org/r/535753

Joe added a subscriber: Joe.Wed, Sep 11, 6:31 AM

While I support the use of this patch, the problem you're seeing should be greatly mitigated when we start using a middleware to manage service-to-service RPC. For now that's still in its infancy, but we already use that approach for cirrussearch, where requests are proxied via a local nginx on each appserver.