Page MenuHomePhabricator

Support running MediaWiki without 'curl' PHP extension
Open, LowPublic

Description

Approved proposal

  • Don't require curl at install time;
  • Keep existing non-curl path;
  • Add non-parallel MultiHttpClient fallback,
  • Provide warning about missing curl on Special:Version.

Original proposal

Right now we have some features that require curl (it's probably the #2 issue third parties have with installing VisualEditor, for instance, after mis-matched versions; it's also needed for Math and several other extensions).

Historically, we bent over backwards to provide a fallback using file streams. Although a few older paths still use this, it's not practical for the newer MultiHTTPClient which is used for the VirtualRESTServiceClient, amongst other important features.

Failure to have cURL installed is a big cause of third party user confusion, and the installer already for
some time has suggested that they install it. Instead of having so confusing a situation where you don't need cURL unless you do, now we just require it outright, simplifying things a great deal.

We already require five PHP extensions – xml, ctype, json, iconv, and (as discussed in March in T129435) mbstring.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Same as kaldari and Bawolff. Another Workaround would probably be to add Checks for the existence of a php Extension in the setup process of the extension or in the updating Routine (even if there's no database changes, which would, of course, be an abuse of Hooks and functionality). But requiring php Extensions because MediaWiki Extensions _might_ require it doesn't seem to be th

In the description, @Jdforrester-WMF wrote:

Historically, we bent over backwards to provide a fallback using file streams. Although a few older paths still use this, it's not practical for the newer MultiHTTPClient which is used for the VirtualRESTServiceClient, amongst other important features.

@Jdforrester-WMF, I'd like to understand this part better. includes/libs/MultiHttpClient.php is part of core and seems to rely on curl pretty heavily. includes/libs/virtualrest/ is also in core, though I admittedly haven't really looked at this area of the code. Presumably, it's still possible to build a lot of functionality in MediaWiki that doesn't rely on VirtualRESTServiceClient. Is the implication here that y'all would like to build more core functionality that requires VirtualRESTServiceClient, and don't want to continue to make VirtualRESTServiceClient support a curl-free environment?

Tgr added a comment.Jun 16 2016, 3:16 PM

Just curious, is there a technical reason why MultiHTTPClient does not support stream wrappers, or is it just the cost of maintaining two separate implementations? stream_select seems to do roughly the same thing as curl_multi.

The solution to the user confusion problems of VisualEditor is probably some kind of PHP module requirements support in extension.json. It already has a requires field, and it uses Composer libraries, and Composer does handle PHP module requirements, so I imagine that's not too difficult.

FWIW, besides the five extensions mentioned in the task description, we sort of require openssl or mcrypt as well for core (since 0b8b539).

brion added a comment.Jun 22 2016, 8:36 PM

(From quick checkin w/ archcom): there are probably three possible outcomes here:

  1. no change
  2. require curl, recommend wider use of MultiHttpClient
  3. create alternate non-curl implementation, recommend wider use of MultiHttpClient

So it'd be great to hear from people who might want to use MultiHttpClient or other currently curl-requiring bits in core. I know we could benefit from MultiHttpClient in places like ForeignAPIRepo (InstantCommons), there may be other places.

Are there any other bits that use curl directly as well as MultiHttpClient & the Swift stuff?

brion added a comment.Jun 22 2016, 8:37 PM

Oops forgot one last bit:

difference between 2 and 3 depends on how much work to create the alternate, and whether it seems worth it.

Tgr added a comment.Jun 22 2016, 9:06 PM

I know we could benefit from MultiHttpClient in places like ForeignAPIRepo (InstantCommons)

I don't think there is too much benefit there. The action API can do most of its operations in batches, which is probably faster than parallel requests. Parallelization would only make a big difference for downloading images, and any site that cares about performance will configure its ForeignApiRepo to either hotlink or use a 404 handler (in which case requests are already parallelized).

I know we could benefit from MultiHttpClient in places like ForeignAPIRepo (InstantCommons)

I don't think there is too much benefit there. The action API can do most of its operations in batches, which is probably faster than parallel requests. Parallelization would only make a big difference for downloading images, and any site that cares about performance will configure its ForeignApiRepo to either hotlink or use a 404 handler (in which case requests are already parallelized).

Foreignapirepo would benefit a lot from batching, but that would require refactoring the parser. We cant easily parallize or batch the current code (afair)

So it'd be great to hear from people who might want to use MultiHttpClient or other currently curl-requiring bits in core. I know we could benefit from MultiHttpClient in places like ForeignAPIRepo (InstantCommons), there may be other places.
Are there any other bits that use curl directly as well as MultiHttpClient & the Swift stuff?

The "read HTML via Parsoid" work is presumably going to require this, initially in MobileFrontend and later in core itself, right? In general all our use of services is via this pattern now.

@Jdforrester-WMF - would you be available to discuss this RFC this coming Wednesday (at E224)? If so, excellent, this may be our choice for that time slot! If not, would you mind if we discussed it anyway? How about everyone else? (See comments in E224 for other options we're considering)

(From quick checkin w/ archcom): there are probably three possible outcomes here:

  1. no change
  2. require curl, recommend wider use of MultiHttpClient
  3. create alternate non-curl implementation, recommend wider use of MultiHttpClient

For option 3, there's actually 2 possible sub outcomes:

  • Full feature implementation using stream_select
  • Barest back compatibility, that makes no attempt to do things in parallel, but falls-back to doing the http requests in sequence.

The latter would probably be very simple to implement (Just a loop calling Http::request()). I think that might be a nice compromise, if we don't want to maintain two large implementations.

demon added a subscriber: demon.Jun 28 2016, 3:47 PM

I would prefer requiring curl just so we can stop having 2 parallel implementations of HTTP fetching (even for the simple non-multi requests). Extra tech debt, extra abstraction, extra place to keep an eye on for security.

Krinkle added a subscriber: Krinkle.EditedJun 28 2016, 4:32 PM

The latter would probably be very simple to implement (Just a loop calling Http::request()). I think that might be a nice compromise, if we don't want to maintain two large implementations.

Agreed. I don't think there is much added value in requiring php-curl to be installed for MediaWiki. A fallback using MwHttpRequest is simple enough.

This is also an area we should consider moving to a separate component or replacing with an existing library. It's pretty straight forward.

WordPress stable (4.x) does the same thing right now. Their WP_Http class tests one of two different transports: curl, or streams.

In WordPress master they recently (March 2016) landed support for making multiple requests in parallel. The fallback for streams is to make requests in a loop.

Restating what @brion summarized as our choices in T137926#2400355:

  1. no change (advocated most recently by @Florian in T137926#2384514)
  2. require curl, recommend wider use of MultiHttpClient (advocated most recently by @demon in T137926#2412126)
  3. create alternate non-curl implementation, recommend wider use of MultiHttpClient (advocated most recently by @Krinkle in T137926#2412298)

Is that an accurate assessment?

demon added a comment.Jun 28 2016, 5:17 PM
  1. require curl, recommend wider use of MultiHttpClient (advocated most recently by @demon in T137926#2412126)

MultiHttpClient is a benefit to having curl (less userland logic for us), but my biggest point would be that having 2 methods for doing the same thing is a code maintenance burden.

brion added a comment.Jun 28 2016, 6:26 PM
  1. require curl, recommend wider use of MultiHttpClient (advocated most recently by @demon in T137926#2412126)

MultiHttpClient is a benefit to having curl (less userland logic for us), but my biggest point would be that having 2 methods for doing the same thing is a code maintenance burden.

There are also two subproposals for '3. create alternate non-curl implementation':

  • 3a full-featured parallel implementation by whatever means necessary (moving parts which may require future maintenance)
  • 3b provide the API for asking for multiple resources, with non-parallel performance degradation on non-curl systems

3b has very little direct maintenance burden (the implementation being roughly 'run a loop over these URLs and call into the single-request path'), and still works for those without curl.

Downside of 3b: third-party users without curl installed who ignore the install-time warning and *do* later encounter it as a performance hotspot may have trouble figuring out why some things are unexpectedly slow... whereas if we'd required them to 'apt-get install php5-curl' at the start they would never encounter the problem.

Downside of optimizing for the downside: if we up the requirement, then other people wanting to just keep up a small installation on a weird host they like or must use may suddenly encounter a firm requirement they cannot get past for a performance issue they may never encounter.

My inclination is to go with 3b. We get the benefits of wider adoption of a paralleliz*able* interface that can help with things we need in Wikimedia production sites, without requiring any specific implementation backend for people without our same use cases, and we don't have to write and maintain a bunch of other code. Third-party installs using the same backend get the same benefit, and those that don't are no worse off than before.

Downside of optimizing for the downside: if we up the requirement, then other people wanting to just keep up a small installation on a weird host they like or must use may suddenly encounter a firm requirement they cannot get past for a performance issue they may never encounter.

I don't think its just weird hosts that don't have php5-curl installed. I think it might be a surprisingly common config for shared hosts. (I have no numbers to back this up, just based on what I've seen, particularly from instant commons related support requests when we transitioned to https only)

demon added a comment.Jun 28 2016, 6:51 PM

Ugh shared hosting 😞

brion added a comment.Jun 28 2016, 7:03 PM

I count shared hosts among 'weird hosts'... They're out there and they get used, whether we like it or not. What kind of experience the people who use those services get is partly up to the host, but partly up to us.

This is the part we get to control: either we tell them they can keep using MediaWiki, or we tell them to stay on old versions and not install future security updates.

demon added a comment.Jun 28 2016, 7:12 PM

I'm curious, do any core functions of MediaWiki require external requests? I can't think of any...just a bunch of things you can turn on (InstantCommons, caching, object stores, extensions).

Perhaps the middle ground here? Don't require it, but strongly suggest it in the installer with the caveat that XYZ features just don't work unless you do? That would allow us to remove the fallbacks (security & maintenance burdens --) and just improve the failure scenarios when you try to turn on unsupported features.

I'm curious, do any core functions of MediaWiki require external requests? I can't think of any...just a bunch of things you can turn on (InstantCommons, caching, object stores, extensions).
Perhaps the middle ground here? Don't require it, but strongly suggest it in the installer with the caveat that XYZ features just don't work unless you do? That would allow us to remove the fallbacks (security & maintenance burdens --) and just improve the failure scenarios when you try to turn on unsupported features.

I think instantcommons is the type of feature that's very likely to be used by people who would be affected by requiring curl.

Perhaps the middle ground here? Don't require it, but strongly suggest it in the installer with the caveat that XYZ features just don't work unless you do?

This is a really interesting line of thinking. What if MediaWiki had a "limited" mode that was sorta like $wgMiserMode, but is slightly more user visible (by default). That way, even amatuer admins start searching for "how do I turn this off", but they still have a working wiki if they don't have the time to futz with it.

We could lump all of the "strongly suggested" stuff in "limited mode", and then have the installer give people the list of stuff they need to install to get out of "limited mode" on every upgrade.

demon added a comment.Jun 28 2016, 8:04 PM

Perhaps the middle ground here? Don't require it, but strongly suggest it in the installer with the caveat that XYZ features just don't work unless you do?

This is a really interesting line of thinking. What if MediaWiki had a "limited" mode that was sorta like $wgMiserMode, but is slightly more user visible (by default). That way, even amatuer admins start searching for "how do I turn this off", but they still have a working wiki if they don't have the time to futz with it.
We could lump all of the "strongly suggested" stuff in "limited mode", and then have the installer give people the list of stuff they need to install to get out of "limited mode" on every upgrade.

It's kind of like the idea of progressive enhancement in the browser land. You start with a barebones experience for users who can't support the extra candy. Then as they upgrade (or our case, install dependencies), the experience gets improved for them.

I think it allows us to bridge the best of both worlds. Plus it acts as gentle encouragement towards better supported platforms instead of failing to work at all.

brion added a comment.Jun 28 2016, 8:14 PM

tl;dr jump to end paragraph

I'm curious, do any core functions of MediaWiki require external requests? I can't think of any...just a bunch of things you can turn on (InstantCommons, caching, object stores, extensions).

Offhand, I think that InstantCommons would be the primary user-visible core feature on small installations that requires an HTTP client and does not currently require curl.

It may be that extension features are even more popular and I've no idea what they need; since Wikimedia Foundation doesn't make any attempt to survey feature usage of MediaWiki installs, we're kinda shooting in the dark here. :)

Perhaps the middle ground here? Don't require it, but strongly suggest it in the installer with the caveat that XYZ features just don't work unless you do? That would allow us to remove the fallbacks (security & maintenance burdens --) and just improve the failure scenarios when you try to turn on unsupported features.

For InstantCommons my understanding is that it presently works without curl, unless we actively choose to either remove the non-curl support in MWHttpRequest's PhpHttpRequest subclass or replace ForeignAPIRepo's use of MWHttpRequest with MultiHttpClient without adding a non-curl replacement.

On two fronts:

  1. As I understand we've already done the work to get PhpHttpRequest to initialize openssl with the right certs, so the costs to making sure InstantCommons works with our HTTPS-only world whether using curl or not have already been spent. But there are potential future work needs in terms of security, correctness, and expansion to support HTTP/2 etc, which we could neatly sidestep by deferring to libcurl's conveniently existing, well-supported implementation.
  1. ForeignAPIRepo could benefit from MultiHttpClient for some specific cases like fetching thumbnail URLs (where the web APIs don't allow for requesting multiple files at different sizes), but otherwise mostly will get parallelism from asking for information about multiple files in a single request (already provided for on the repo end, needs more work on the parser end).

So there's no strong need to make Instant Commons depend on curl for functionality; switching it to require curl would be our choice based on our own schedule for maintaining or removing the PhpHttpRequest class.

I hesitate to say "deprecate" because it's easy to confuse usages between "warning people that an interface is going to be removed in the near future so they should not use it" and "recommending that another, better, interface be used instead, but saying nothing specific about whether this one will continue to exist". Further, in this case there's no *interface* to deprecate, we're talking about removing an invisible backend, which actually changes the internal API contract of MWHttpRequest from 'always available' to 'might be unavailable'.

So, we could kill the non-curl backend, change MWHttpRequest to explicitly say when it's unavailable, and devise both install/upgrade-time and runtime warnings that point to a documentation page that's clear enough to explain to a harried, part-time sysadmin whose inherited a wiki probably set up by a former co-worker or teammate or forum denizen how to either fix or work around the error.

I'm not against this, as long as thought through and well documented. :)

demon added a comment.Jun 28 2016, 8:32 PM

Calling out this one point in particular so it doesn't get lost:

It may be that extension features are even more popular and I've no idea what they need; since Wikimedia Foundation doesn't make any attempt to survey feature usage of MediaWiki installs, we're kinda shooting in the dark here. :)

See also: T56425: Provide an opt-in ability to register the user's MediaWiki installation. We've wanted to this for some time, nobody's had the time to implement it though. There's lots of useful stats that we could gather about feature usage and support. That could help guide these sorts of decisions with real data.

bd808 added a subscriber: bd808.Jun 28 2016, 8:40 PM

Interesting data and analysis at http://mtdowling.com/blog/2013/05/02/requiring-curl-in-your-php-library/. Includes some analysis of presence of php-curl bindings by default at a number of shared hosting providers.

brion added a comment.Jun 28 2016, 8:49 PM

Yeah, be nice to have firmer data, and I think we should def do another push on T56425. Just knowing curl is available at various big hosts is a help for now though!

brion added a comment.Jun 28 2016, 9:10 PM

Ok, so updated outcome choices replacing the hard dependency with a soft dependency:

  1. no change (advocated most recently by @Florian in T137926#2384514)
  2. drop non-curl path for MWHttpRequest, add clear error messaging & install-time doc for features that use MWHttpRequest or MultiHttpClient on non-curl hosts (advocated most recently by @demon in T137926#2412673)
  3. keep non-curl path in MWHttpRequest, create alternate non-curl implementation of MultiHttpClient (advocated most recently by @Krinkle in T137926#2412298)

Any objection to going into tomorrow's discussion with those choices ready on the table?

Tgr added a comment.Jun 28 2016, 10:48 PM

Offhand, I think that InstantCommons would be the primary user-visible core feature on small installations that requires an HTTP client and does not currently require curl.

Also image sharing between wikis of the same farm, and SpamBlacklist (bundled with the tarball) fetching the blacklist from meta.

Ok, so updated outcome choices replacing the hard dependency with a soft dependency:

  1. no change (advocated most recently by @Florian in T137926#2384514)

I think a soft dependency would be a good compromise, too, as long as the standard functions of MediaWiki are still working (ok, now we would need to define "Standard functions") :)

@Jdforrester-WMF - would you be available to discuss this RFC this coming Wednesday (at E224)? If so, excellent, this may be our choice for that time slot!

Sure.

Also image sharing between wikis of the same farm, and SpamBlacklist (bundled with the tarball) fetching the blacklist from meta.

Its technically optional for image sharing on same farm (If you disable sharing image descriptions). SpamBlacklist is another area that I think would be highly like to affect unsophisticated users, who would be most troubled by dropping curl support.

@Jdforrester-WMF - would you be available to discuss this RFC this coming Wednesday (at E224)? If so, excellent, this may be our choice for that time slot!

Sure.

Excellent! Let's use this time to see if we can figure out a good answer to Brion's multiple choice question posed at T137926#2412996

Change 294259 abandoned by Jforrester:
[DO NOT MERGE] PHPVersionCheck: Require curl to be installed

Reason:
Per RfC discussion, we're not going to require this yet, just shout from the rooftops that you really /really/ should install it.

https://gerrit.wikimedia.org/r/294259

Last-call proposal:

  • Not require curl;
  • Keep existing non-curl path;
  • Add non-parallel MultiHttpClient fallback,
  • Provide environmental warnings about missing curl (and other things?) on Special:Version.
RobLa-WMF triaged this task as High priority.Jun 30 2016, 10:25 PM
Tgr added a comment.Jul 1 2016, 1:31 PM
  • Provide environmental warnings about missing curl (and other things?) on Special:Version.

That page already suffers from information overload. Maybe a good time to start a new special page which could eventually become some sort of site admin status panel? Missing packages, disabled optimizations, uninstalled security upgrades etc., like Drupal's status report panel.

  • Provide environmental warnings about missing curl (and other things?) on Special:Version.

That page already suffers from information overload. Maybe a good time to start a new special page which could eventually become some sort of site admin status panel? Missing packages, disabled optimizations, uninstalled security upgrades etc., like Drupal's status report panel.

Possibly. For this to serve the intended purpose, it would need to be viewable and easily findable by:

  1. Non-admin users of the site, so that they can pester the site admins in charge of the setup
  2. By people providing MediaWiki support (e.g. people on Freenode #mediawiki) who are trying to help a site admin figure out what is wrong with the site

Special:Version is traditionally where this type of information has gone, and it seems like it should at least be an easily findable hyperlink away from Special:Version if it's not included inline.

Bawolff added a comment.EditedJul 3 2016, 4:08 PM

I think its reasonable to put this in the Installed Software section. We already include the version of libicu the intl extension is compiled against, so we could just add curl + version of curl to that list.

Krinkle renamed this task from Require 'curl' PHP extension for MediaWiki to Support running MediaWiki without 'curl' PHP extension.Jul 6 2016, 8:15 PM
Krinkle raised the priority of this task from High to Needs Triage.
Krinkle triaged this task as High priority.
Krinkle moved this task from Last Call to TechCom-Approved on the TechCom-RFC board.
Krinkle edited projects, added TechCom-RFC (TechCom-Approved); removed TechCom-RFC.
Krinkle updated the task description. (Show Details)
Jdforrester-WMF lowered the priority of this task from High to Low.Apr 19 2017, 6:59 PM
Krinkle updated the task description. (Show Details)Aug 1 2018, 12:55 AM
Krinkle moved this task from Backlog to In progress on the TechCom-RFC (TechCom-Approved) board.
Krinkle updated the task description. (Show Details)
Krinkle removed a project: Proposal.

Change 494630 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] Remove various references to cURL in code comments

https://gerrit.wikimedia.org/r/494630

Change 494630 merged by jenkins-bot:
[mediawiki/core@master] Remove various references to cURL in code comments

https://gerrit.wikimedia.org/r/494630

Krinkle updated the task description. (Show Details)Mar 9 2019, 9:59 PM

Now that things work without cURL, including for the MultiHttpClient. Does it still make sense to output a warning? There is no degraded feature set.

There is certainly potential in terms of performance, but we don't output warnings for other such potentials either.

That's traditionally been exposed at install/upgrade time instead. We currently does this for diff3, apcu, and intl, for example.

CCicalese_WMF added a subscriber: CCicalese_WMF.

Low priority, not ready to be worked at this point

not ready to be worked at this point

@CCicalese_WMF: Does that mean this task is technically blocked? (On the subtask here?) If yes, its status should probably be stalled?

That should probably have stated, "we (Core Platform Team) don't have the resources to work it at this point, but will get to it soon". It is no longer blocked, so we put it in our Next queue.

@CCicalese_WMF: Thanks for clarifying! That means that a potential contributor could work on this, I guess. (That was the underlying question I had in mind.)

We (Core Platform Team) are reworking our workboards. This task is still tracked by the Core Platform Team, but we are no longer using the Next tag. Removing the tag does not imply any change to the status of this task.