Add non-parallel MultiHttpClient fallback for environments that don't have curl available
Open, NormalPublic

Pchelolo moved this task from Backlog to watching on the Services board.Oct 12 2016, 6:45 PM
Pchelolo edited projects, added Services (watching); removed Services.

This is also an area we should consider moving to a separate component or replacing with an existing library. It's pretty straight forward.

WordPress stable (4.x) does the same thing right now. Their WP_Http class tests one of two different transports: curl, or streams.

In WordPress master they recently (March 2016) landed support for making multiple requests in parallel. The fallback for streams is to make requests in a loop.

Krinkle added a comment.EditedNov 2 2016, 5:57 PM

The aforementioned Requests library from WordPress is now available as a standalone composer package that we could use (dependency free and supports PHP 5.2+).

Tgr added a comment.Nov 2 2016, 11:25 PM

Ugh, if we use an external library, can we use something with non-static calls? Guzzle for example is widely used, supports PSR-7 (which is terrible but still the only standard in PHP-land), can do parallel and async requests, and can be properly injected.

Ugh, if we use an external library, can we use something with non-static calls? Guzzle for example is widely used, supports PSR-7 (which is terrible but still the only standard in PHP-land), can do parallel and async requests, and can be properly injected.

The existing MediaWiki Http entry point class is entirely static (Http::request, Http::get, etc.). The underlying transport classes (MWHttpRequest, PhpHttpRequest, CurlHttpRequest) are, however, instantiated. Typically we have one instance for each request.

The Requests library is quite similar. It has Requests::request( $url, ... $options ) as static entry point, but underlying Transport classes are instantiated. WordPress even wraps around this with another WP_Http class which is, again, instantiated and passes the current configuration to the static entry point. I don't see any particular design problem with this, nor does it make it harder to test or use in any way. It reflects how one would call curl or fsockopen underneath (naturally, static, with all relevant options explicitly specified). The instantiated wrapper can remember and pass through options based on site configuration and current instance state in whatever way we see fit.

The main place where it maintains state, where I wish it didn't, is its Requests::request_multiple and underlying get_transport methods, which statically re-uses the same multi-request session. Remembering that state within the library is unclean. However, it's merely a convenience default. It does allow for proper injection of the Transport or Multi transport instance through the options, which bypasses any such static state - which is how we'd use it.

Guzzle actually seems to have some amount of static state too. From a quick check it also allows at least default options to be statically overridden, which would make it hard to re-use between unrelated code paths. It seems this can't be bypassed. It's also curl-only, and the main quest here was to transparently support installations without curl.

demon removed a subscriber: demon.Jan 18 2017, 10:33 PM
Krinkle edited projects, added MediaWiki-Platform-Team; removed Contributors-Team.EditedApr 2 2018, 1:07 PM

(Still outstanding per RFC T137926, approved in 2016.)

tstarling assigned this task to BPirkle.Tue, Jul 10, 2:45 AM

I'm reviewing task history and implementation options.

From the latest at http://docs.guzzlephp.org/en/stable/overview.html#requirements

"Guzzle no longer requires cURL in order to send HTTP requests. Guzzle will use the PHP stream wrapper to send HTTP requests if cURL is not installed."

@BPirkle This task's history is a bit confusing (sorry about that). I still believe it's worth swapping out our HTTP library for an upstream one (and helping maintain that, instead), but I'm not sure we should block this task on that.

MediaWiki's own HTTP class already supports cURL and non-cURL, and has for many years. This particular issue is only with the newer MultiHttpClient abstraction, which was written directly atop cURL, without fallback.

The simplest solution would be to have it fallback to serially performing requests with our existing HTTP class (reduces the risk factor significantly by re-using battle-tested code).

We may want to move the the discussion about swapping the library to T110022.

Thanks, @Krinkle , that's good perspective.

I agree with using a proven short-term solution for the immediate need and considering a more comprehensive approach (under a separate task) for the longer term goals.

Change 445544 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Added non-parallel fallback to MultiHttpClient when curl is unavailable

https://gerrit.wikimedia.org/r/445544

Prospective change pushed.

The options accepted by and the return values provided by the Http class are somewhat different (and in the case of return values substantially less detailed) than the ones used by MultiHttpClient's existing curl interface. I mapped these in what is hopefully a reasonable way for use as a fallback case, but I would appreciate feedback.

Also, as this is my first submission, I welcome feedback on coding style, my use of the Gerritt/Phabricator process, or anything else relevant.