|Open||BPirkle||T137926 Support running MediaWiki without 'curl' PHP extension|
|Open||None||T110022 Create a library with HTTP related functions/code|
|Resolved||BPirkle||T139169 Add non-parallel MultiHttpClient fallback for environments that don't have curl available|
- Mentioned In
- T197535: extension.json should be able to express dependency on PHP version / PHP extension version
T202352: Convert MultiHttpClient to use Guzzle
T202143: Security review for Guzzle 6.3.3
T202110: Create GuzzleHttpRequest class as new default for HttpRequestFactory
T153953: Better error message for VisualEditor when php7.0-curl package is missing
T110022: Create a library with HTTP related functions/code
- Mentioned Here
- T110022: Create a library with HTTP related functions/code
T137926: Support running MediaWiki without 'curl' PHP extension
The aforementioned Requests library from WordPress is now available as a standalone composer package that we could use (dependency free and supports PHP 5.2+).
Ugh, if we use an external library, can we use something with non-static calls? Guzzle for example is widely used, supports PSR-7 (which is terrible but still the only standard in PHP-land), can do parallel and async requests, and can be properly injected.
The existing MediaWiki Http entry point class is entirely static (Http::request, Http::get, etc.). The underlying transport classes (MWHttpRequest, PhpHttpRequest, CurlHttpRequest) are, however, instantiated. Typically we have one instance for each request.
The Requests library is quite similar. It has Requests::request( $url, ... $options ) as static entry point, but underlying Transport classes are instantiated. WordPress even wraps around this with another WP_Http class which is, again, instantiated and passes the current configuration to the static entry point. I don't see any particular design problem with this, nor does it make it harder to test or use in any way. It reflects how one would call curl or fsockopen underneath (naturally, static, with all relevant options explicitly specified). The instantiated wrapper can remember and pass through options based on site configuration and current instance state in whatever way we see fit.
The main place where it maintains state, where I wish it didn't, is its Requests::request_multiple and underlying get_transport methods, which statically re-uses the same multi-request session. Remembering that state within the library is unclean. However, it's merely a convenience default. It does allow for proper injection of the Transport or Multi transport instance through the options, which bypasses any such static state - which is how we'd use it.
Guzzle actually seems to have some amount of static state too. From a quick check it also allows at least default options to be statically overridden, which would make it hard to re-use between unrelated code paths. It seems this can't be bypassed. It's also curl-only, and the main quest here was to transparently support installations without curl.
I'm reviewing task history and implementation options.
From the latest at http://docs.guzzlephp.org/en/stable/overview.html#requirements
"Guzzle no longer requires cURL in order to send HTTP requests. Guzzle will use the PHP stream wrapper to send HTTP requests if cURL is not installed."
@BPirkle This task's history is a bit confusing (sorry about that). I still believe it's worth swapping out our HTTP library for an upstream one (and helping maintain that, instead), but I'm not sure we should block this task on that.
MediaWiki's own HTTP class already supports cURL and non-cURL, and has for many years. This particular issue is only with the newer MultiHttpClient abstraction, which was written directly atop cURL, without fallback.
The simplest solution would be to have it fallback to serially performing requests with our existing HTTP class (reduces the risk factor significantly by re-using battle-tested code).
We may want to move the the discussion about swapping the library to T110022.
Prospective change pushed.
The options accepted by and the return values provided by the Http class are somewhat different (and in the case of return values substantially less detailed) than the ones used by MultiHttpClient's existing curl interface. I mapped these in what is hopefully a reasonable way for use as a fallback case, but I would appreciate feedback.
Also, as this is my first submission, I welcome feedback on coding style, my use of the Gerritt/Phabricator process, or anything else relevant.
I don't think it does, unless you mean messing with RedirectMiddleware::$defaultSettings (which you are pretty clearly not meant to).
It's also curl-only, and the main quest here was to transparently support installations without curl.
"Guzzle can be used with cURL, PHP's stream wrapper, sockets, and non-blocking libraries like React" http://docs.guzzlephp.org/en/latest/faq.html#does-guzzle-require-curl