Page MenuHomePhabricator

Make MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections
Closed, ResolvedPublic

Description

As proposed in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/534642/ , concurrency could be improved and connection overhead largely reduced. In testing around via shell.php, hitting 1K enwiki URLs goes from 17.5s for each of two patches in a row, down to 1.5 - 3.5s for each. Also netstat only shows 50 TCP connections rather than 2000 in ESTABLISHED/TIME_WAIT.

Test.php file included from shell.php to expose $benchmark().

$reqs = array_map(
	function ( $word ) {
		return [
			'method' => 'HEAD',
			'url' => "https://en.wikipedia.org/wiki/" . ucfirst( trim( $word ) )
		];
	},
	file( "$IP/words.list" ) // 1000 word list
);

$http = new MultiHttpClient( [] );
$benchmark = function () use ( $http, $reqs ) {
	for ( $i = 1; $i <= 2; ++$i ) {
		$start = microtime( true );
		$reqs = $http->runMulti( $reqs );
		$real = microtime( true ) - $start;

		$codes = [];
		foreach ( $reqs as $req ) {
			$codes[] = $req['response']['code'];
		}

		echo "$real sec\n";
		var_dump( array_count_values( $codes ) );
	}
};

Event Timeline

aaron created this task.Sep 5 2019, 5:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 5 2019, 5:41 PM

Change 534642 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Improve MultiHttpClient connection concurrency and reuse

https://gerrit.wikimedia.org/r/534642

mobrovac renamed this task from Max MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections to Make MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections.Sep 10 2019, 10:59 AM
mobrovac triaged this task as Normal priority.

Change 534642 merged by jenkins-bot:
[mediawiki/core@master] Improve MultiHttpClient connection concurrency and reuse

https://gerrit.wikimedia.org/r/534642

hashar added a subscriber: hashar.Sep 10 2019, 3:08 PM

https://gerrit.wikimedia.org/r/534642 broke the train due to one of the new curl option not being recognized (I guess by HHVM).

mobrovac changed the task status from Open to Stalled.Sep 10 2019, 3:10 PM
mobrovac added a subscriber: mobrovac.

The patch had to be reverted because CURLMOPT_MAX_HOST_CONNECTIONS was introduced only in PHP 7, so there is no support for it in HHVM. We'll need to revisit this idea once we are PHP7+ only, as I don't see an obvious way of circumventing this problem and making the patch still improve things for the time being.

Change 535753 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Use rolling handles in MultiHttpClient and allow connection reuse

https://gerrit.wikimedia.org/r/535753

Joe added a subscriber: Joe.Sep 11 2019, 6:31 AM

While I support the use of this patch, the problem you're seeing should be greatly mitigated when we start using a middleware to manage service-to-service RPC. For now that's still in its infancy, but we already use that approach for cirrussearch, where requests are proxied via a local nginx on each appserver.

MaxSem changed the task status from Stalled to Open.Oct 7 2019, 8:05 PM
MaxSem added a subscriber: MaxSem.

Unblocked.

Change 535753 merged by jenkins-bot:
[mediawiki/core@master] Improve MultiHttpClient connection concurrency/reuse (given PHP >= 7.0.7)

https://gerrit.wikimedia.org/r/535753

Jdforrester-WMF closed this task as Resolved.Oct 8 2019, 9:52 PM
Jdforrester-WMF assigned this task to aaron.