Page MenuHomePhabricator

Move HTTP-related code from MW to its own library
Open, MediumPublic

Description

To be considered:

  • (includes/http) Http, MWHttpRequest, CurlHttpRequest
  • (libs) HttpAcceptParser, HttpAcceptNegotiator
  • (libs) MultiHttpClient
  • (libs) HttpStatus

This would mean that said library could be used by https://github.com/addwiki/mediawiki-api-base which is required by https://github.com/addwiki/wikibase-api.

https://github.com/addwiki/wikibase-api could then easily be used by the WikibaseClient potentially helping to resolve T48556

Not only that, but this code is also perfect for a library....

Event Timeline

Addshore raised the priority of this task from to Medium.
Addshore updated the task description. (Show Details)
Addshore added projects: Librarization, Wikidata.
Addshore subscribed.
Addshore set Security to None.
Addshore rescinded a token.
Addshore awarded a token.

Can you please connect this to a story or epic?

Legoktm renamed this task from Factor contents of HttpFunctions.php into separate library to Create a library with HTTP related functions/code.Sep 21 2016, 6:07 AM
Legoktm updated the task description. (Show Details)

I made the subject a little more generic as we probably want to library-ize MultiHttpClient too. However, that is blocked on T139169: Add non-parallel MultiHttpClient fallback for environments that don't have curl available.

I poked at librarizing includes/http recently - the dependencies are CookieJar, Status[Value] and at-ease. So StatusValue would have to be librarized first. Also the current interface is rather ugly.

Per https://gerrit.wikimedia.org/r/#/c/356121/, a good starting point would be to move the (new) HttpAcceptNegotiator and HttpAcceptParser classes to a Packagist-published PHP library.

We can later consider moving other classes to it as well (such as Http::request, MWHttpRequest, and MultiHttpClient).

Removed from the description.
The tasks were moved to phab, but I cant remember which one #5 was...

This task has been around for awhile, and related discussion and action has since occurred elsewhere. To summarize from this task and from T139169, T137926, and T139170:

  • we would like to make curl optional (with a suitable warning)
  • we made MultiHttpClient work without curl in T139169, but at the cost of introducing an undesirable dependency on MWHttpRequest, which is currently not within includes/libs. So in solving one issue, we created another.
  • we should either use an external library, or create our own. Options presented were:

— Requests (https://packagist.org/packages/rmccue/requests), used by WordPress
— Guzzle (http://docs.guzzlephp.org), originally required curl but now has a fallback to PHP streams
— librarize our existing code (per this tasks description) . Requires librarizing some dependencies (StatusValue, etc.)

Here’s an article about when Wordpress switched to Requests: https://make.wordpress.org/core/2016/07/27/http-api-in-4-6/ Scroll down to the comments for some (admittedly dated) discussion of Requests vs Guzzle from their perspective.

I support pursuing using an external library, as there seem to be (at least) two suitable existing options. MultiHttpClient might be a good place to start, as it is currently breaking our rules. I'm happy to create a task for that and take it on, but before I get started I'm curious if the summary I just wrote above still accurately reflects our larger goals, and if there are additional thoughts I should consider.

Usually the main limitation of external libraries is that they don't support localization (of error messages, mainly). Not sure if we actually care about that in the case of not particularly user-facing libraries, or just took advantage of it because it was free...

If we do care about localization, there's also the possibility of wrapping an external library in custom code that does support localization. Current http-* entries in en.json are fairly generic and could continue to be so. The external library would handle the details of the actual request, including the fallback from curl to php streams as necessary. In that model, MWHttpRequest and MultiHttpClient continue to exist, but much simplified compared to their current implementations. From my brief read, Wordpress used that approach.

A wrapper would be straight-forward indeed, but another idea to consider could be to handle localisation at a higher layer that catches the exception, and responsible for printing it in some way. That approach might be easier to scale to other libraries we use, including those used indirectly by other libraries that we don't control.

We'd basically map the exception class to a localisation message somewhere, quite similar to what we already do for various exception types in various places. Perhaps we could centralise that a bit within MediaWiki :)

We'd basically map the exception class to a localisation message somewhere, quite similar to what we already do for various exception types in various places. Perhaps we could centralise that a bit within MediaWiki :)

The fundamental concept in MediaWiki's localization system (and in localization systems in general, I suppose) is the separation of message types and parameters. That's hard to do in a wrapper. To take a random example, Guzzle does this:

if (!file_exists($options['verify'])) {
    throw new \InvalidArgumentException(
        "SSL CA bundle not found: {$options['verify']}"
    );
}

Mapping by class is not really useful. Mapping by message would involve regex parsing messages - possible but a bit icky (also fragile).

I wish we could map by exception class, but it seems challenging to do that in a useful way. Looking at the two libraries that have been mentioned:

Guzzle tends to throw generic exception classes. The most popular appear to be:

  • InvalidArgumentException (49 occurrences)
  • LogicException (6 occurrences)
  • RuntimeException (33 occurrences)

Requests, on the other hand, pretty much only throws Requests_Exception (29 occurrences)

In neither case would a message based on the exception class be helpful for diagnosing an issue. The exception message is the interesting part, and also the hardest part to localize.

Who would our intended audience be for localized exception messages? They're probably incomprehensible to end users whether translated or not, and I'm assuming that developers speak sufficient English to read the untranslated messages (given that the code and its comments are in English). So are localized exception messages intended for people who administer/configure mediawiki sites? I'm wondering if they need to be translated at all.

The other aspect is that what constitutes an "exception" to a library may be a different thing than what constitutes an exception to the calling code. A socket timeout may constitute an exception to Requests, but I'm guessing that calling code would want to catch that and give a meaningful message in the context of whatever operation was being performed.

I very much like the idea of having centralized, or at least consistent, ways of handling things. But in this situation, I wonder if on a practical level we'll find that the fact that something is a "library" implies that its calling code must interpret exceptions on a case-by-case basis.

Localization questions aside, it seemed that in 2016 (on T139169), @Krinkle preferred Requests, while @Tgr preferred Guzzle. Do each of you still feel the same, and are there any additional possibilities that you've become aware of since then?

FWIW, I'd maybe offer a soft poke in the direction of Guzzle, mostly because I find that having REST-y functionality integrated can be helpful. But it's not something that I feel strongly about.

Localization questions aside, it seemed that in 2016 (on T139169), @Krinkle preferred Requests, while @Tgr preferred Guzzle. Do each of you still feel the same

Oops, I forgot to follow up there, did it now (god bless Phabricator drafts). I'm not familiar with Requests but at a glance Guzzle seems like a much more mature library.

and are there any additional possibilities that you've become aware of since then?

The ones that are prominent enough to have a HTTPlug connector are Zend, Buzz and the Cake client (and React and Artax but looping clients are not really appropriate for us).

I’m also a soft lean to Guzzle. It seems more active recently than most of the alternatives, suitable to our needs, and generally popular. I hate to overweigh “popularity” as a metric, but in the case of open source, momentum doesn’t hurt. It is also much easier to google for, as searching for "requests" gives a flood of false positives.

It seems like Requests is less active - the author says as much here:

https://github.com/rmccue/Requests/issues/320

However, that’s not necessarily negative, it could simply mean the library is stable. Its continued use WordPress should help maintain its critical mass, and gives us a reference implementation (for whatever that is or isn't worth).

My goals with this conversation are

  1. avoid burning unnecessary time by exploring obvious dead ends, or missing an obvious winner
  2. gather sufficient consensus and detail to create a properly focused subtask for the actual implementation

I don't mind at all doing exploratory implementations with both Guzzle and Requests as part of that process. (In fact, I already have them both installed in my local copy of mediawiki and have confirmed that basic GET operations work happily with both.)

At this point, I think I'll explore Guzzle for more sophisticated usages, and await comments from @Krinkle to decide if I should do the same for Requests.

We'd basically map the exception class to a localisation message somewhere [..]

[..]
Mapping by class is not really useful. Mapping by message would involve regex parsing messages - possible but a bit icky (also fragile).

I wish we could map by exception class, but it seems challenging to do that in a useful way. [..]

I was suggesting only a simple map, or a catch-all even – not any form of message extraction. Looking at MediaWiki's current Http, MWHttpRequest and MultiHttpClient class, we also don't provide localisation for low-level details about the error, such as the path to a CA bundle. We have those in exception messages only, not localised. The only user-facing messages we have are:

  1. http-invalid-url ($1: url) – Based on a validation check before the request is made.
  2. http-bad-status ($1: statusCode, $2: statusText) – After a successful response, read from a public interface.
  3. http-internal-error (no params) – Everything else.

This is feasible with either of the two library options (Guzzle and Requests), and likely other libraries as well.

I agree that non-static state and active maintenance are desirable traits. I haven't recently checked either library for those aspects, but I trust @BPirkle and @Tgr that Guzzle is better in that regard.

I'd like to instead provide feedback on another aspect: Stability.

Stability

Being part of WordPress has a very strong benefit with regards to stability. In particular:

  • WordPress is very serious about backwards compatibility.
  • WordPress is very serious about PHP compatibility.

This means we can be reasonably certain that we can always use the latest stable version of the library, with upstream support for any issues that may arise in either its public interface or compatibility with any of the PHP versions MediaWiki supports.

It also means there likely isn't going to be a major release any time soon that would drop support for a PHP version we support, or that would make a breaking change in its public interface. This reduces the chances of getting stuck with an older or unsupported version.

This means we can be reasonably certain that we can always use the latest stable version of the library, with upstream support for any issues that may arise in either its public interface or compatibility with any of the PHP versions MediaWiki supports.

It also means there likely isn't going to be a major release any time soon that would drop support for a PHP version we support, or that would make a breaking change in its public interface. This reduces the chances of getting stuck with an older or unsupported version.

I think this can also work backwards, like if upstream doesn't add support for newer PHP versions, then we're stuck.

Made this suggestion in T201409 but figured it should be repeated here: whatever the implementation ends up being, it should be easy (or possibly required?) to set the User Agent string, with the goal that any caller will use that to specify what job/service/etc is making a call. It makes it much, much easier to debug cross-service calls when you have that info.

I think this can also work backwards, like if upstream doesn't add support for newer PHP versions, then we're stuck.

I agree that's a general concern to be aware of when comparing vendors. WordPress specifically, though, has no precedent for being slow to support new PHP versions. They've been as fast or faster than us. (PHP7.0 both in late 2015, PHP 7.2 also on par, or arguably we're behind if we also consider warnings).

Guzzle has been added to the codebase and made available through the new GuzzleHttpRequest class, per T202110.

Krinkle renamed this task from Create a library with HTTP related functions/code to Move HTTP-related functions/classes from MW to its own library.Oct 7 2019, 6:14 PM
Krinkle renamed this task from Move HTTP-related functions/classes from MW to its own library to Move HTTP-related code from MW to its own library.

So, for this task it would still be great to have GuzzleHttpRequest or MWHttpRequest in its own library...

Or have them implement ClientInterface.

Or I guess I would just have to write some adapter.

The whole point of this ticket is that the interface that is used in core is usable outside core in other libraries.