Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Added non-parallel fallback to MultiHttpClient when curl is unavailable | mediawiki/core | master | +376 -62 |
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | BPirkle | T137926 Support running MediaWiki without 'curl' PHP extension | |||
Open | None | T110022 Move HTTP-related code from MW to its own library | |||
Resolved | BPirkle | T139169 Add non-parallel MultiHttpClient fallback for environments that don't have curl available |
Event Timeline
The aforementioned Requests library from WordPress is now available as a standalone composer package that we could use (dependency free and supports PHP 5.2+).
Ugh, if we use an external library, can we use something with non-static calls? Guzzle for example is widely used, supports PSR-7 (which is terrible but still the only standard in PHP-land), can do parallel and async requests, and can be properly injected.
The existing MediaWiki Http entry point class is entirely static (Http::request, Http::get, etc.). The underlying transport classes (MWHttpRequest, PhpHttpRequest, CurlHttpRequest) are, however, instantiated. Typically we have one instance for each request.
The Requests library is quite similar. It has Requests::request( $url, ... $options ) as static entry point, but underlying Transport classes are instantiated. WordPress even wraps around this with another WP_Http class which is, again, instantiated and passes the current configuration to the static entry point. I don't see any particular design problem with this, nor does it make it harder to test or use in any way. It reflects how one would call curl or fsockopen underneath (naturally, static, with all relevant options explicitly specified). The instantiated wrapper can remember and pass through options based on site configuration and current instance state in whatever way we see fit.
The main place where it maintains state, where I wish it didn't, is its Requests::request_multiple and underlying get_transport methods, which statically re-uses the same multi-request session. Remembering that state within the library is unclean. However, it's merely a convenience default. It does allow for proper injection of the Transport or Multi transport instance through the options, which bypasses any such static state - which is how we'd use it.
Guzzle actually seems to have some amount of static state too. From a quick check it also allows at least default options to be statically overridden, which would make it hard to re-use between unrelated code paths. It seems this can't be bypassed. It's also curl-only, and the main quest here was to transparently support installations without curl.
I'm reviewing task history and implementation options.
From the latest at http://docs.guzzlephp.org/en/stable/overview.html#requirements
"Guzzle no longer requires cURL in order to send HTTP requests. Guzzle will use the PHP stream wrapper to send HTTP requests if cURL is not installed."
@BPirkle This task's history is a bit confusing (sorry about that). I still believe it's worth swapping out our HTTP library for an upstream one (and helping maintain that, instead), but I'm not sure we should block this task on that.
MediaWiki's own HTTP class already supports cURL and non-cURL, and has for many years. This particular issue is only with the newer MultiHttpClient abstraction, which was written directly atop cURL, without fallback.
The simplest solution would be to have it fallback to serially performing requests with our existing HTTP class (reduces the risk factor significantly by re-using battle-tested code).
We may want to move the the discussion about swapping the library to T110022.
Thanks, @Krinkle , that's good perspective.
I agree with using a proven short-term solution for the immediate need and considering a more comprehensive approach (under a separate task) for the longer term goals.
Change 445544 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Added non-parallel fallback to MultiHttpClient when curl is unavailable
Prospective change pushed.
The options accepted by and the return values provided by the Http class are somewhat different (and in the case of return values substantially less detailed) than the ones used by MultiHttpClient's existing curl interface. I mapped these in what is hopefully a reasonable way for use as a fallback case, but I would appreciate feedback.
Also, as this is my first submission, I welcome feedback on coding style, my use of the Gerritt/Phabricator process, or anything else relevant.
Change 446705 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/mediawiki/core into T139169
Change 446705 abandoned by BPirkle:
Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/mediawiki/core into T139169
Reason:
This change was accidental and has no value.
Change 445544 merged by jenkins-bot:
[mediawiki/core@master] Added non-parallel fallback to MultiHttpClient when curl is unavailable
I don't think it does, unless you mean messing with RedirectMiddleware::$defaultSettings (which you are pretty clearly not meant to).
It's also curl-only, and the main quest here was to transparently support installations without curl.
"Guzzle can be used with cURL, PHP's stream wrapper, sockets, and non-blocking libraries like React" http://docs.guzzlephp.org/en/latest/faq.html#does-guzzle-require-curl