Page MenuHomePhabricator
Paste P3320

ArchCom-RFC-2016W26-irc-E224.txt
ActivePublic

Authored by RobLa-WMF on Jun 29 2016, 9:58 PM.
21:02:57 <robla> #startmeeting ArchCom RFC Meeting: Require 'curl' PHP extension for MediaWiki?
21:02:57 <wm-labs-meetbot`> Meeting started Wed Jun 29 21:02:57 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:57 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:57 <wm-labs-meetbot`> The meeting name has been set to 'archcom_rfc_meeting__require__curl__php_extension_for_mediawiki_'
21:03:10 <brion> yummy curl!
21:03:12 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
21:03:36 <James_F> Heya.
21:03:37 <robla> #link https://phabricator.wikimedia.org/E224
21:03:42 <robla> James_F: hi!
21:04:00 <robla> #link https://phabricator.wikimedia.org/T137926
21:04:02 <brion> ok, so we have a rough proposal to require the PHP curl extension, at least for HTTP-using stuff
21:04:06 <brion> couple of possible options
21:04:45 <brion> no change (meaning some of out internal interfaces like MultiHttpClient which is curl-only will be unavailable when not using curl, but things using MWHttpClient will still work)
21:05:07 <robla> #info list of 3 options listed at https://phabricator.wikimedia.org/T137926#2412996
21:05:25 <brion> or we could require it for MwHttpRequest / Http::get() etc as well, removing the old non-curl paths which are harder to maintain
21:05:45 <brion> another possibility is to provide a non-curl implementation of MultiHttpClient as well, even if it's a dead-simple one
21:06:13 <brion> in the case where we start requiring curl for more bits, we'd probably want better error reporting / failure cases
21:06:29 <brion> eg, either "don't break the wiki when using InstantCommons but curl is not present" or "break the wiki in a very clear, easy to fix way"
21:07:07 <brion> we don't have a *real* clear picture of how commonly available the php curl extension is, but anectdotally it's on a number of major shared posting providers
21:07:28 <brion> and if you run your own server it's trivial to make sure it's installed (which also means we can easily make sure our debian packages etc depend on it)
21:07:29 <SMalyshev> brion: do we have features in MultiClient that are impossible to do without curl?
21:07:34 <SMalyshev> or very hard
21:08:02 <brion> SMalyshev: IIRC it's a fairly simple API, you pass a number of URLs in and it runs them simultaneously, then returns an array mapping of responses
21:08:16 <brion> which means you can make a crappy back-compat very easily by running serial instead of parallel
21:08:19 <SMalyshev> brion: so what happens if it runs them sequentially instead?
21:08:22 <TimStarling> there is maxConnsPerHost
21:08:24 <SMalyshev> right
21:08:25 <brion> or a fancy back-compat using stream_select, but only on newer PHP
21:08:39 <brion> SMalyshev: slower, but satisfies the API contract
21:08:39 <James_F> brion: Newer than 5.5 which we already require?
21:08:40 <Scott_WUaS> Hello All :)
21:08:45 <TimStarling> so a b/c version that runs them serially would be equivalent to configuring maxConnsPerHost=1
21:08:46 <brion> James_F: right iirc
21:08:56 <brion> *nod*
21:09:07 <SMalyshev> not even stream_select, beggars can't be choosers, just bare bones functionality - I think we can do it rather easily? if so, I'd lean towards doing it
21:09:16 <brion> James_F: secondarily, creating a 'proper' back-compat path just adds to our complexity and maintenance/security surface.
21:09:22 <James_F> Yup.
21:09:37 <bawolff> Yeah, i suspect the bare bones version would be like 5 lines
21:09:53 <James_F> Well. We already have some of this complexity. The RfC was about admitting that it doesn't work well and isn't tested or used in production, so we were going to remove it.
21:10:09 <TimStarling> are there bug reports about it not working well?
21:10:10 <James_F> If we're keeping it and making it "cool", fine, but… someone has to keep looking after it.
21:10:23 <brion> TimStarling: iirc we saw problems with SSL certs and InstantCommons
21:10:23 <SMalyshev> right... so if we can make super-simple super-dumb implementation that still works, I think we should. If it's complex or missing features - then we should fail early and let people install curl
21:10:32 <James_F> TimStarling: More "not well" as in "90% of uses don't have the back-compat".
21:10:34 <brion> work was done to make sure openssl got initialized properly on the back-compat path
21:10:50 <brion> but that's only one thing, might be more work next time
21:10:53 <bawolff> But since that's been fixed, it works well now
21:10:55 <James_F> I can't speak to the security side of this stuff, but bawolff I guess is OK with it?
21:11:08 * bawolff has not looked at the code recently
21:11:16 <bawolff> But I fixed a bunch of stupid things a while back
21:11:16 <SMalyshev> with curl, I had issues with servers using SSL and certificate auths that are new or exotic... figuring out how to add cert auth is non-trivial IIRC
21:11:30 <SMalyshev> but that is probably true for any other way of doing it too
21:11:52 <brion> in general i'm leery of maintaining an HTTP client in MWHttpRequest's PhpHttpRequest subclass forever, libcurl does that job for us
21:12:01 <TimStarling> there is T102566
21:12:02 <stashbot> T102566: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566
21:12:11 <brion> eventually we'll have to update it for http/2 etc, or else throw it out then :)
21:12:25 <bawolff> But T102566 was broken both with curl and non-curl
21:12:25 <stashbot> T102566: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566
21:12:41 <James_F> bawolff: In different ways, so it needed two fixes, or the same issue?
21:12:42 <SMalyshev> can't http/2 fallback to regular http?
21:13:00 <James_F> SMalyshev: Some of our client UAs already report HTTP2-only accept headers.
21:13:06 <bawolff> The main issue was it was configured to not follow redirects
21:13:19 <bawolff> so redirects didn't work on both implementations (as it should)
21:13:22 <robla> #info question discussed for the first 10-15 minutes: is having a fallback worthwhile at all?
21:13:25 <brion> SMalyshev: we don't know for sure what the future will bring; at the least i expect someone will evnetually want persistent connections for something
21:13:38 <SMalyshev> James_F: that's... aggressive. But yeah, I don't want to maintain php http/2 library :)
21:13:42 <brion> bawolff: oh fun :D
21:14:17 <James_F> SMalyshev: Agreed.
21:14:21 <bawolff> There was some additional problems with https in the fallback config. I think in some configs it thought all certs were invalid or something. It was a while ago. Those issues are fixed
21:14:42 <brion> so another possible outcome of this discussion is "let's actually survey availability of php-curl before changing"
21:14:52 <SMalyshev> brion: right... maybe have some way of asking http client if it supports certain features? OTOH, if we do have such features I'd just take the plunge and drop the fallback
21:14:55 <brion> which i didn't think to capture on the task comments earlier
21:15:37 <brion> but surveying is hard when we don't know we can reliably reach affected users who don't upgrade frequently, are on obscure hosts, etc
21:16:28 <James_F> <troll> If they don't upgrade they won't notice the change…
21:16:43 <brion> :D
21:16:55 <James_F> This is a common issue in support for third parties; very few voices, very little data.
21:17:03 <brion> danger is if they do, they'll tableflip in anger at how nothing works anymore
21:17:11 <brion> but we probably broke their site in 10 other ways if that's the case :(
21:17:19 <brion> so that isn't the worst additional breakage :D
21:17:27 <James_F> They'll also do that when the tarball-bundled editor requires curl, which is ~ a year away.
21:17:29 <James_F> So…
21:17:30 <SMalyshev> but if we just tell them "install curl" isn't not that bad?
21:17:42 <SMalyshev> curl is pretty widely supported I think
21:17:49 <brion> it is pretty widely supported yes
21:18:04 <brion> but not everyone has root on their system, and not everybody is a professional or hobbyist sysadmin
21:18:13 <brion> what's trivial to us is not trivial to everyone
21:18:25 <SMalyshev> right, but most shared hosts I'd expect to have curl
21:18:32 <brion> yep
21:18:44 <SMalyshev> same for university etc environment - it's part of almost every prepackaged setup
21:18:48 <brion> so it's a balance between large pain to small group, small pains to large group, etc
21:18:53 <SMalyshev> unless it's insanely stripped down
21:19:10 <brion> i suspect that 'tearing the band-aid off' is going to be net win
21:19:25 <brion> since down the road we'll want to just have curl handy
21:19:36 <brion> i just wish i had more data!
21:19:58 <SMalyshev> we could keep the fallback but still require curl e.g. in composer. This way if you don't have curl you could remove the require and hope for the best
21:20:16 <SMalyshev> at least for now until we add http/2 and http/3 support :)
21:20:20 <brion> hehe
21:20:28 <bawolff> I'm not sure I see what the down-the-road usecases are. What are we actually planning to add that requires this?
21:21:17 <James_F> bawolff: Parsoid HTML reading. VE as the tarball editor. Non-wikitext wikis.
21:21:22 <brion> bawolff: 'requires' is a tricky thing. technically, nothing. but it's nice to not maintain an http client, and it's nice not to have to create a second implementation of the multiplexing parallel client wrapper.
21:21:56 <James_F> bawolff: Also, 'requires' MultiHttpClient rather than curl specifically.
21:22:09 <TimStarling> it looks to me like it's not that hard to support non-curl installations, it's just that nobody could be bothered to do that small amount of work
21:22:41 <robla> #info James_F clarifies this RFC is about requiring MultiHttpClient, rather than curl specifically
21:22:45 <TimStarling> T102566 and [[InstantCommons#HTTPS]] say "just try installing curl and see if it works"
21:22:46 <stashbot> T102566: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566
21:22:49 <SMalyshev> I think it's not hard to support now, but we don't know long-term - maybe it stays not hard, and maybe it becomes hard (e.g. if everybody moves to http/2)
21:23:01 <TimStarling> the bug with non-curl is not isolated, nobody could be bothered
21:23:17 <brion> oh yes, it's trivial to add a non-curl api-compatible single-threaded implementation
21:23:18 <James_F> robla: Well, no. The impending requirements are for MultiHttpClient which we're standardising on. The curl requirement for that requirement is what we're discussing, I think.
21:23:24 <brion> and moderately more work to make a proper implementation.
21:23:31 <brion> so the question is, do we do that *and* continue to maintain them?
21:23:31 <TimStarling> and like bawolff said previously, you could do MultiHttpClient with non-curl with 5 lines of code
21:23:32 <bawolff> TimStarling: That bug referenced on InstantCommons is fixed afaik, the help page was just never updated
21:23:43 <robla> #info <James_F> robla: Well, no. The impending requirements are for MultiHttpClient which we're standardising on. The curl requirement for that requirement is what we're discussing, I think.
21:23:53 <SMalyshev> TimStarling: I think that's what we should be doing first - i.e. if somebody has http problems, we should tell them to install curl. If they can't, then it becomes interesting, but we should start with it by default
21:25:00 <TimStarling> SMalyshev: that doesn't sound right to me, have a solution which is not tested or maintained, just a trap for new users
21:25:11 <tgr> TimStarling: the non-curl bug is about installations which don't have the right certs, there is no bug in the MediaWiki stream wrapper implementation per se
21:25:31 <bawolff> I'd note, if you look at the git log, overall there's been very little maintenance needed for the php stream fallback implementation all and all
21:25:34 <TimStarling> presumably PHP doesn't bundle its own authority list
21:25:37 <SMalyshev> TimStarling: well, it is tested now I think. But eventually if the main MultiHttpClient changes it may become stale
21:25:42 <robla> we've talked about "strongly recommending" curl; do we have a clear idea of what "strong recommendation" means?
21:25:54 <TimStarling> so it would have just been a matter of configuration
21:25:54 <brion> so if we don't object to continuing to maintain the non-curl MWHttpRequest path, it's fairly trivial to make a single-threaded wrapper for api compat and not lock the non-curl folks out of future features
21:26:07 <TimStarling> or maybe installing the right cert list package
21:26:19 <SMalyshev> TimStarling: the question is - what's better, having some code that may work or have no code at all for non-curl? I'm honestly not sure here
21:26:24 <bawolff> Currently we tell php to use the system cert directory
21:26:28 <brion> and indeed it may not be a huge amount of work. maybe we never will care about http/2 client on it, or by the time we do it'll be definitely no big deal to go curl-only
21:26:34 <bawolff> which will work for non-windows hosts
21:26:38 <bawolff> mostly anyways
21:26:45 <bd808> robla: if the argument for "strong recommendation" is less tested code then I guess it would be similar to mysql vs any other db backend
21:26:52 <SMalyshev> TimStarling: no, php relies on OS AFAIR. Distros may do it though
21:27:14 <bawolff> I think its rather unlikely we'll need to support http/2 any time soon
21:27:18 <brion> bd808: similar, but with a very different bug surface i think
21:27:29 <bawolff> I have a very hard time imaging any sites will be http/2 only any time soon
21:27:33 <brion> DBs are much more fragile with weird APIs that are tweaked a bajillion ways
21:27:40 <TimStarling> in an ideal world, we would have automated testing that runs with both curl and non-curl
21:27:43 <brion> HTTP 1.1 is complex but much less so
21:27:49 <TimStarling> that is the usual solution to developer laziness
21:27:52 <brion> and our usage of it is mostly very simple
21:27:59 <brion> :D
21:29:07 <brion> any strong feelings on the maintenance question?
21:29:18 <James_F> TimStarling: Example of expected heavy MultiHttpClient use in core is https://phabricator.wikimedia.org/T111588 (finally found the damn link)
21:29:25 <brion> if we're not in fact that concerned about it then we can concentrate on the narrow question of MultiHttpClient
21:29:30 <ostriches> I'm kinda indifferent on the whole thing now.
21:30:12 <brion> James_F: that's a good one! heavy api-fetch usage is very possible there and parallel's a huge win
21:30:20 <robla> #info James_F finds "Example of expected heavy MultiHttpClient use in core": T111588
21:30:20 <stashbot> T111588: [RFC] API-driven web front-end - https://phabricator.wikimedia.org/T111588
21:30:49 <bawolff> James_F: That seems like a rather far-in-the-future proposal
21:31:05 <subbu> this is a slightly tangential troll-y question .. but what is our threshold for how far we are willing to go to support shared host wikis? i.e. is there a development / maintenance threshold at which point we say features X, Y, Z may not be available on those platforms?
21:31:30 <ostriches> What subbu said ^
21:31:37 <brion> subbu: excellent question which has never been answered adequetly :D
21:31:41 <James_F> It keeps coming up, though.
21:31:43 * subbu hasn't read the scrollback yet .. so jumping in the middle.
21:32:01 <James_F> And Parsing and Services both have to consider it a lot in some of their work.
21:32:19 <brion> at this point though it sounds like we're not so concerned with dropping our non-curl path that it's going to be an issue on the narrow question
21:32:55 <brion> eg, it sounds like nobody wants to kill non-curl path and we'll be deciding between 'make a single-threaded MultiHttpClient fallback path' and 'make a fancier non-curl parallel MultiHttpClient fallback path for versions of PHP that support it suitably'
21:33:20 <brion> so, it does go into 'are we willing to do that extra work?'
21:33:33 <TimStarling> subbu: I'm fine with answering that on a case-by-case basis
21:33:37 <brion> it mighta ctually be we don't require newer php, now that i look at it. stream_select on the raw tcp streams should already be available
21:33:49 <subbu> TimStarling, that is reasonable.
21:33:55 <TimStarling> I've found in the past that a lot of imagined conflicts between WMF and shared hosting don't really eventuate in practice
21:34:07 <robla> perhaps we can officially deprecate non-curl intentionally breaking it now?
21:34:31 <robla> er..."without intentionally breaking it now"
21:35:12 <SMalyshev> brion: the question is do we really want to deal with http over stream_select?
21:35:21 <TimStarling> sure, there are stream functions, and there are HTTP libraries that work on top of them that we could import
21:35:23 <SMalyshev> we may spend a lot of time for very small reward
21:35:47 <SMalyshev> unless we have a ready-made library of comparable power
21:35:53 <brion> exactly, it's more work to do the proper implementation (or to find and import another library)
21:36:01 <brion> but it's very easy to do the simple back-compat
21:36:16 <TimStarling> I remember there was one in PEAR, years ago, no doubt it still exists
21:36:26 <brion> and as robla suggests we _could_ deprecate it without breaking it, as an implicit warning that folks should use curl when available
21:36:32 <SMalyshev> PEAR would mean it's probably old...
21:36:57 <brion> yeah, i can't recall if we thought about migrating to it but hated the code, or migrated from it to our own ;)
21:37:31 <brion> i think there's no need to _actively_ break non-curl unless UI becomes dependent on parallel fetches for reasonable performance
21:38:06 <brion> and it's easy even then to say 'well it might be slowing on your config, but enable curl if you can'
21:38:23 <robla> if we deprecate, perhaps we could also come up with a way for a novice user to know when they are running in a deprecated environment
21:38:33 <brion> that's a good point
21:38:37 <TimStarling> there's the environment check page in the installer
21:38:42 <brion> *nod*
21:38:52 <brion> the installer's one place, but i'd love to have something visible on maybe special:version
21:39:14 <brion> someplace you can check on your live installation, that you'd already be directed to check for other environmental things in a help-debug session
21:39:30 <SMalyshev> special:version sounds good
21:39:35 <robla> Special:Version has the added perk that users of the site can know, and use the info to prod the admin
21:39:41 <brion> yeah :D
21:39:57 <brion> ok i think we've got something coming together
21:40:43 <brion> so recommend: keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version ?
21:41:04 <James_F> WFM.
21:41:05 <brion> any objections or alternate proposals?
21:41:10 <brion> woohoo
21:41:11 <robla> #info <brion> so recommend: keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version ?
21:41:41 <brion> this leaves open the possibility of adding a parallel non-curl path but does not commit us to doing so
21:42:10 <brion> and also leaves open possibility of fully deprecating non-curl but does not force it
21:42:18 <bawolff> Can we put the env warnings on both Special:Version and installer
21:42:20 <brion> (eg later if we find we really need to)
21:42:22 <brion> on yeah def
21:42:38 <brion> #info <bawolff> Can we put the env warnings on both Special:Version and installer
21:43:01 <ostriches> Yeah, that makes sense ^
21:43:09 <TimStarling> maybe it is time to revisit the idea of a pingback from the installer for statistical purposes
21:43:12 <ostriches> We don't actually do environmental warnings on Special:Version yet, but I think we should
21:43:22 <ostriches> TimStarling: +1, brion and I talked about that yesterday
21:43:45 <TimStarling> every time this sort of thing comes up, we have no idea of what our users' installations look like
21:43:46 <robla> #info <TimStarling> maybe it is time to revisit the idea of a pingback from the installer for statistical purposes
21:43:59 <ostriches> Time for new-new-installer :D
21:44:05 <robla> lol
21:44:25 <brion> yes def
21:44:26 <Reedy> does that count as volunteering?
21:44:26 <James_F> There's already an environmental warning about curl in the installer, BTW.
21:44:34 <brion> a pingback that probably will use MWHttpRequest... ;)
21:44:40 <SMalyshev> isn't that what wikiapiary is doing?
21:44:40 <James_F> brion: Tsk. :-)
21:44:49 <TimStarling> I proposed it once on wikitech-l but I was shouted down by the libertarians
21:44:52 <ostriches> And might know have curl to tell us :p
21:44:55 <TimStarling> as if they're not all using facebook by now
21:45:02 <brion> lol
21:45:03 <ostriches> lmao
21:45:17 <James_F> TimStarling: For some things, just implement and ignore the wails. :-)
21:45:23 <ostriches> TimStarling: I think last time we seriously talked about it like 2-3 years ago the consensus had been "as long as it's opt-in"
21:45:23 <SMalyshev> privacy issues are... touchy. Maybe do the opt-in?
21:45:34 <brion> SMalyshev: do they have an auto-report widget you can add to your site or do they just crawl & make stats from that?
21:45:36 <James_F> TimStarling: For people with real security needs, they'd be installing it inside an air-gapped machine anyway.
21:45:38 <brion> (wikiapiary)
21:45:49 <ostriches> With, of course, a glaring option to turn it on during install time.
21:45:58 <SMalyshev> brion: I have no idea, I haven't looked into how they work.
21:45:59 <ostriches> (we want to encourage it to be on :))
21:46:05 <robla> is "keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version " what we should put in last call?
21:46:10 <TimStarling> I think a prominent checkbox enabled by default would be good enough
21:46:18 <brion> robla: I think we're good to go on that yeah
21:46:20 <James_F> robla: Yes.
21:46:27 <brion> put it in last call and let any final objections come in on list
21:46:43 <robla> #info "keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version " goes to last call
21:46:47 <brion> feels good, we made a decision \o/
21:46:51 <SMalyshev> I think if we just have checkbox or button saying "send anonymous statistics about your install to MediaWiki devs" it's fine
21:47:35 <brion> TimStarling: did you want to revive the rfc for the installer beacon? or shall i put it on my todo list?
21:47:37 <Reedy> Would it be useful to do something like that from update.php?
21:47:54 <ostriches> James_F: Thanks for filing the RfC. Wasn't the outcome we thought of when we discussed it weeks back, but I think the outcome is good :)
21:48:03 <brion> or can we foist it on someone else :D
21:48:29 <brion> i don't want it to get lost though if we get caught up in our other big things
21:48:33 <ostriches> Reedy: I think we should trigger a pingback on any install/upgrade, yeah
21:48:53 <ostriches> Ideally more often than that, but we don't really support a "has config changed" hook (yet)
21:48:55 <Reedy> I guess we set a global based on install time preference....
21:49:22 <TimStarling> SMalyshev: let's continue this discussion on facebook
21:49:26 <Reedy> But question of what we do about "current" installs... Ask them nicely to add the global to their localsettings?
21:49:28 <robla> we also had T136866 penciled in here if we had time, but I think we can wrap up on brion's question
21:49:28 <stashbot> T136866: Improve the per-programming-language listings for our tools - https://phabricator.wikimedia.org/T136866
21:49:57 <ostriches> Reedy: We could prompt for the new setting in update.php
21:50:10 <ostriches> Ideally, we should actually do that anyway (somehow) to call out important config changes.
21:50:16 <ostriches> "This setting is new and you might wanna change it"
21:50:20 <Reedy> mmm
21:50:24 <ostriches> Or "This setting doesn't do what you think anymore..."
21:50:31 <brion> heh
21:50:42 <Reedy> I wonder if we could reuse the environment type notifications on Special:Version too
21:50:42 <robla> #info brion wants to make sure we have a plan for an installer beacon
21:50:50 <Reedy> "reuse" in a loose sense
21:51:08 <ostriches> Reedy: We could abstract them a bit further to make them portable I'm sure.
21:51:20 <brion> Reedy: yeah don't dupe the code & messages if they're common checks ideally
21:51:22 <ostriches> They're already pretty well isolated, just in a "weird" place for Special:Version to want them
21:51:29 <James_F> ostriches: Sure!
21:52:00 <brion> ok if nobody grabs the beacon rfc today i'll just add it to our agenda to discuss at the archcom prep meeting next week :D
21:52:05 <Reedy> well, if it's installer messages... they'll be duped :)
21:52:26 <robla> thanks brion! that sounds like a good plan
21:52:35 <brion> excellent :D
21:53:12 <brion> ok that pretty much locks us up for this week i think
21:53:15 <brion> thanks everybody!
21:53:40 <robla> thanks all!
21:53:46 <robla> ending in a few seconds
21:54:02 <robla> #action brion, robla discuss beacon at E225
21:54:15 <robla> #endmeeting