21:01:32 <wm-labs-meetbot> Meeting started Wed Oct 26 21:01:32 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:05:44 <robla> matmarex suggested "We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML"
21:05:52 <robla> does that seem like a good idea?
21:06:26 <TimStarling> MatmaRex is not in this channel and is marked /away
21:06:35 <TimStarling> but maybe we have bawolff?
21:06:42 <bawolff> Yep
21:07:16 <DanielK_WMDE_> i think it's important to differentiate between a) should mediawiki support this at all and b) should this be allowed/used on wikimedia sites
21:07:45 <DanielK_WMDE_> also, what html is typically encountered in that context, and how well is that supported by different renderers?
21:08:13 <bawolff> as i wrote on the bug, seems in theory workable but im concerned the sanitizer wants to rewrite during sanitization which means youd have to be very pedantic in writing your svgs
21:08:41 <bawolff> unless we change the upload pipeline to allow files to be modified during upload
21:08:47 <TimStarling> as for (a), if there was a patch in gerrit to allow HTML behind a feature flag, it would presumably be accepted, we don't have a massively high bar for that sort of thing, right?
21:09:54 <robla> TimStarling: I think the bar is "is this secure?"
21:09:58 <TimStarling> yeah, you would have trouble using sanitizer directly
21:10:25 <DanielK_WMDE_> TimStarling: or more nicely, some dort of pluggable validation thingy, yea.
21:10:29 <TimStarling> I'm not really a big fan of Sanitizer in terms of the way the code is currently structured
21:10:33 <SMalyshev> I'd guess the security depends on the sanitizer?
21:10:38 <DanielK_WMDE_> then this could be an extension rather than a feature flag
21:11:12 <DanielK_WMDE_> SMalyshev: also on the consumer of the html. currently, it's us rendering these, not the browser.
21:11:32 <SMalyshev> you mean SVGs?
21:11:45 <robla> what originally started this conversation was a discussion about a draw.io plugin where the feature flag wanted was "turn off sanitization"
21:11:51 <DanielK_WMDE_> yes. well, the html inside the svgs
21:11:56 <SMalyshev> ah yes if we have SVG renderer we also need to see what's going on there
21:12:26 <TimStarling> according to bawolff, our SVG renderer does not support HTML, you'll just get an empty box
21:12:26 <bawolff> They were using svg foreign objecty things
21:12:56 <bawolff> so there may be fallback for renders not supporting it
21:12:56 <TimStarling> which is not surprising, it's a very complicated spec to pull in
21:29:14 <bawolff> and i think we would be better using someone elses sanitizer than our own
21:31:09 <bawolff> Since more eyes on their sanitizer than ours
21:31:11 <robla> bawolff: you're suggesting that the Wikimedia instances would rely on server-side Javascript (i.e. Node.js). do you think it makes sense to require Node.js for secure SVG?
21:31:31 <robla> (for third parties?
21:31:37 <yurik_> hi, sorry i'm late to SVG party. DanielK_WMDE_ I second your desire for interactive SVGs :) Graphs at this point can produce SVG output, but graphs does not (yet) support an ability to create URL links
21:31:54 <bawolff> i wouldnt say im happy about it...
21:32:04 <TimStarling> that library does not look like an improvement
21:32:38 <Scott_WUaS> (hi All)
21:33:04 <yurik> so if graphoid was allowed to produce SVG output, it would already be a quality improvement, but not necessarily a functional improvement right away
21:33:39 <yurik> but at least it gives a path forward, whereas without SVG, we have no easy way of producing clickable graphs
21:33:59 <robla> legoktm pointed out T86874 on wikitech-l
21:33:59 <stashbot> T86874: Make SVG sanitization into a library - https://phabricator.wikimedia.org/T86874
21:34:35 <bawolff> TimStarling: can you be more specific? I havent looked at that library in detail im more coming from a i dont like our current approach
21:35:15 <robla> #info question discussed: is DOMPurify an improvement over the status quo
21:35:53 <TimStarling> for a start, it's only 860 physical lines of code, so if you wanted it you could just port it to PHP
21:36:51 <TimStarling> it's apparently just an XSS filter, not a cross-site request filter
21:38:12 <bawolff> Yes, there would have to be a layer on top for parts outside their security model but inside ours
21:38:23 <TimStarling> I don't know how well it has been reviewed
21:39:18 <gwicke> IIRC csteipp viewed DOMPurify favorably, but we didn't manage to finish the review before he left
21:39:25 <TimStarling> it's a lot more permissive than Sanitizer
21:40:48 <gwicke> agreed on the need to layer our specific restrictions on top
21:41:23 <TimStarling> var IS_ALLOWED_URI = /^(?:(?:(?:f|ht)tps?|mailto|tel):|[^a-z]|[a-z+.\-]+(?:[^a-z+.\-:]|$))/i;
21:41:36 <gwicke> parsoid's sanitizer works at the token / sax level, so it shouldn't be all that hard to wrap it into a SAX visitor
21:41:45 <TimStarling> I've got to go
21:42:41 <robla> TimStarling: d'oh! ok....thanks for the discussion!
21:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible
21:43:36 <robla> #info 14:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible
21:45:06 <robla> gwicke: do you think we should stop working on our PHP version of SVG validation?
21:45:09 <yurik> robla, do you keep a list of potential usecases? Graphoid SVG output would be the first in line
21:45:36 <yurik> (i meant i will be the first person to get into that line)
21:46:14 <robla> yurik: I don't. This topic isn't *yet* an RFC, but if you want to turn this into an RFC, that'd be a great place to put your case on the first line
21:46:14 <bawolff> Isnt graphoid svgs entirely machine generated (ie not attacker controlled)? Does it need general sanitation?
21:47:00 <gwicke> robla: I think we should look at features / maturity of different options, and then evaluate a) which requires the least initial investment, and b) which will need the bigger long term maintenance effort on our end
21:47:03 <yurik> bawolff, that was my understanding too, but for some reason csteip (i think) wanted to sanitize everything regardless of the source
21:47:18 <yurik> "just in case" (tm)
21:47:42 <robla> gwicke: I can't commit to writing an RFC for this, but that all seems reasonable. would you be up for writing it?
21:47:46 <bawolff> Gwicke: do you know other options?
21:48:05 <gwicke> I haven't checked the PHP code in a while, but my recollection from the last time I talked about this with csteipp was that it wasn't very complete for things like html-in-SVG
21:48:24 <bawolff> DomPurify was the only one i could find that was even remotely good
21:48:36 <yurik> the problem with that logic is that Vega is already enabled on the client, so if there is a security vulnerability, people are already exposed to it (i hope we caught all of them). So just adding SVG output does not seem to increase the attack surface
21:48:41 <gwicke> bawolff: that's the only one we seriously considered back then
21:49:23 <gwicke> but who knows, the web library world is changing every couple of months these days..
21:49:56 <bawolff> Im intrigued by Tim's comment about porting to php was interesting. That honestly never even occured to me
21:50:23 <bawolff> but im not sure about the library support in php for DOM
21:50:42 <bawolff> which dompurify seems to use extensively