HomePhabricator

ArchCom RFC Meeting W43: Allow HTML in SVG? (2016-10-26, #wikimedia-office)
ActivePublic

Hosted by daniel on Oct 26 2016, 9:00 PM - 10:00 PM.

Description

Agenda

Meeting summary

  • T138783 - SVG in HTML (robla, 21:01:52)
  • T138783 - HTML in SVG (robla, 21:02:06)
    • question discussed: should we use the Sanitizer class as HTML validation library for SVG? (robla, 21:15:44)
    • question discussed: should MediaWiki be able to alter SVG uploads on save? (robla, 21:19:47)
    • bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult (robla, 21:24:51)
    • TimStarling would like there to be an image link option which does client-side SVG rendering (TimStarling, 21:26:46)
    • 14:27:41Â <bawolff>Â If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify (robla, 21:28:09)
    • LINK: https://github.com/cure53/DOMPurify (robla, 21:28:57)
    • question discussed: is DOMPurify an improvement over the status quo (robla, 21:35:15)
    • 14:42:58Â <gwicke>Â I definitely think that we should leverage other people's work in this space as much as possible (robla, 21:43:36)
    • LINK: https://github.com/OWASP/java-html-sanitizer looks potentially interesting (gwicke, 21:51:04)
    • LINK: https://github.com/darylldoyle/svg-sanitizer (subbu, 21:53:57)

Meeting ended at 22:00:07 UTC.

Log

121:01:32 <robla> #startmeeting ArchCom IRC meeting
221:01:32 <wm-labs-meetbot> Meeting started Wed Oct 26 21:01:32 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:01:32 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:01:32 <wm-labs-meetbot> The meeting name has been set to 'archcom_irc_meeting'
521:01:52 <robla> #topic T138783 - SVG in HTML
621:01:52 <stashbot> T138783: SVG Upload should (optionally) allow the xhtml namespace - https://phabricator.wikimedia.org/T138783
721:02:06 <robla> #topic T138783 - HTML in SVG
821:02:09 <robla> :-)
921:02:14 <robla> hi folks!
1021:03:06 <robla> so....today's topic is about whether we should allow HTML in our SVG
1121:03:52 * robla goes to find more links to describe this better....
1221:04:45 <robla> <https://lists.wikimedia.org/pipermail/wikitech-l/2016-October/086861.html>
1321:05:44 <robla> matmarex suggested "We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML"
1421:05:52 <robla> does that seem like a good idea?
1521:06:26 <TimStarling> MatmaRex is not in this channel and is marked /away
1621:06:35 <TimStarling> but maybe we have bawolff?
1721:06:42 <bawolff> Yep
1821:07:16 <DanielK_WMDE_> i think it's important to differentiate between a) should mediawiki support this at all and b) should this be allowed/used on wikimedia sites
1921:07:45 <DanielK_WMDE_> also, what html is typically encountered in that context, and how well is that supported by different renderers?
2021:08:13 <bawolff> as i wrote on the bug, seems in theory workable but im concerned the sanitizer wants to rewrite during sanitization which means youd have to be very pedantic in writing your svgs
2121:08:41 <bawolff> unless we change the upload pipeline to allow files to be modified during upload
2221:08:47 <TimStarling> as for (a), if there was a patch in gerrit to allow HTML behind a feature flag, it would presumably be accepted, we don't have a massively high bar for that sort of thing, right?
2321:09:54 <robla> TimStarling: I think the bar is "is this secure?"
2421:09:58 <TimStarling> yeah, you would have trouble using sanitizer directly
2521:10:25 <DanielK_WMDE_> TimStarling: or more nicely, some dort of pluggable validation thingy, yea.
2621:10:29 <TimStarling> I'm not really a big fan of Sanitizer in terms of the way the code is currently structured
2721:10:33 <SMalyshev> I'd guess the security depends on the sanitizer?
2821:10:38 <DanielK_WMDE_> then this could be an extension rather than a feature flag
2921:11:12 <DanielK_WMDE_> SMalyshev: also on the consumer of the html. currently, it's us rendering these, not the browser.
3021:11:32 <SMalyshev> you mean SVGs?
3121:11:45 <robla> what originally started this conversation was a discussion about a draw.io plugin where the feature flag wanted was "turn off sanitization"
3221:11:51 <DanielK_WMDE_> yes. well, the html inside the svgs
3321:11:56 <SMalyshev> ah yes if we have SVG renderer we also need to see what's going on there
3421:12:26 <TimStarling> according to bawolff, our SVG renderer does not support HTML, you'll just get an empty box
3521:12:26 <bawolff> They were using svg foreign objecty things
3621:12:56 <bawolff> so there may be fallback for renders not supporting it
3721:12:56 <TimStarling> which is not surprising, it's a very complicated spec to pull in
3821:13:56 * DanielK_WMDE_ wants interactive/scripted svg
3921:14:03 <TimStarling> an idea I had last hour: maybe we could have an option in the image link which enables client-side SVG rendering
4021:14:24 <bawolff> whether or not rsvg properly handles the fallback i have no idea as i havent tested that
4121:14:43 <TimStarling> you know there are performance reasons to do SVG in the server at the moment
4221:14:47 <DanielK_WMDE_> TimStarling: sounds ok for 3rd parties. for wikimedia, we'd need a decent fallback renderer
4321:14:55 <bawolff> DanielK_WMDE: well thats a whole other bag of worms :)
4421:15:08 <SMalyshev> if sanitizer follows https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext is seems ok
4521:15:44 <robla> #info question discussed: should we use the Sanitizer class as HTML validation library for SVG?
4621:15:49 <bawolff> Sanitizer is what the parser itself uses so it should
4721:15:59 <DanielK_WMDE_> SMalyshev: i think that page is basically documenting what the sanitizer does
4821:16:57 <TimStarling> Sanitizer::removeHTMLtags() is currently only used by the parser, so it probably does some things which are wikitext-specific
4921:17:23 <TimStarling> and like bawolff previously said, it works by rewriting dubious HTML to be less dubious
5021:17:32 <bawolff> The translated entity names for example are wikitext specific
5121:17:41 <TimStarling> so you either have to allow it to alter uploads or have a strict rejection mode
5221:18:15 <robla> (side note: please feel free to add "#info" tag notes to get captured in the summary of this discussion)
5321:19:01 <bawolff> Allowing mediawiki to alter uploads would make a bunch of things unrelated to this much easier
5421:19:03 <TimStarling> for example, unquoted attribute values are rewritten to be double-quoted, which avoids relying on client parsing details
5521:19:46 <TimStarling> there's nothing really stopping us from doing it, right?
5621:19:47 <robla> #info question discussed: should MediaWiki be able to alter SVG uploads on save?
5721:20:29 <bawolff> I think some code has to be rearchitected, but not a major amount
5821:20:53 <TimStarling> AaronSchulz: does it sound difficult to you?
5921:21:03 <bawolff> The upload code is something a lot of people find scary and dont like touching
6021:21:31 <bawolff> but i dont thimk it would be very hard
6121:22:20 <AaronSchulz> bawolff: yeah, probably
6221:23:00 <TimStarling> UploadBase::performUpload() just uploads $this->mTempPath
6321:23:19 <TimStarling> it's not even trying to move the file from the stash, it just reuploads
6421:23:39 <TimStarling> so if you modify that file, you're all done, right?
6521:24:17 <bawolff> I imagine you would want to also give the user a warning
6621:24:31 <TimStarling> sure
6721:24:51 <robla> #info bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult
6821:26:46 <TimStarling> #info TimStarling would like there to be an image link option which does client-side SVG rendering
6921:27:41 <bawolff> If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify
7021:28:09 <robla> #info 14:27:41 <bawolff> If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify
7121:28:22 <robla> thoughts on that?
7221:28:36 * robla goes to find the DomPurify link
7321:28:38 <bawolff> slightly complicated by that project being in js, but thats probably not an issue for wmf at least
7421:28:57 <robla> #link https://github.com/cure53/DOMPurify
7521:29:14 <bawolff> and i think we would be better using someone elses sanitizer than our own
7621:31:09 <bawolff> Since more eyes on their sanitizer than ours
7721:31:11 <robla> bawolff: you're suggesting that the Wikimedia instances would rely on server-side Javascript (i.e. Node.js). do you think it makes sense to require Node.js for secure SVG?
7821:31:31 <robla> (for third parties?
7921:31:37 <yurik_> hi, sorry i'm late to SVG party. DanielK_WMDE_ I second your desire for interactive SVGs :) Graphs at this point can produce SVG output, but graphs does not (yet) support an ability to create URL links
8021:31:54 <bawolff> i wouldnt say im happy about it...
8121:32:04 <TimStarling> that library does not look like an improvement
8221:32:38 <Scott_WUaS> (hi All)
8321:33:04 <yurik> so if graphoid was allowed to produce SVG output, it would already be a quality improvement, but not necessarily a functional improvement right away
8421:33:39 <yurik> but at least it gives a path forward, whereas without SVG, we have no easy way of producing clickable graphs
8521:33:59 <robla> legoktm pointed out T86874 on wikitech-l
8621:33:59 <stashbot> T86874: Make SVG sanitization into a library - https://phabricator.wikimedia.org/T86874
8721:34:35 <bawolff> TimStarling: can you be more specific? I havent looked at that library in detail im more coming from a i dont like our current approach
8821:35:15 <robla> #info question discussed: is DOMPurify an improvement over the status quo
8921:35:53 <TimStarling> for a start, it's only 860 physical lines of code, so if you wanted it you could just port it to PHP
9021:36:51 <TimStarling> it's apparently just an XSS filter, not a cross-site request filter
9121:38:12 <bawolff> Yes, there would have to be a layer on top for parts outside their security model but inside ours
9221:38:23 <TimStarling> I don't know how well it has been reviewed
9321:39:18 <gwicke> IIRC csteipp viewed DOMPurify favorably, but we didn't manage to finish the review before he left
9421:39:25 <TimStarling> it's a lot more permissive than Sanitizer
9521:40:48 <gwicke> agreed on the need to layer our specific restrictions on top
9621:41:23 <TimStarling> var IS_ALLOWED_URI = /^(?:(?:(?:f|ht)tps?|mailto|tel):|[^a-z]|[a-z+.\-]+(?:[^a-z+.\-:]|$))/i;
9721:41:36 <gwicke> parsoid's sanitizer works at the token / sax level, so it shouldn't be all that hard to wrap it into a SAX visitor
9821:41:45 <TimStarling> I've got to go
9921:42:41 <robla> TimStarling: d'oh! ok....thanks for the discussion!
10021:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible
10121:43:36 <robla> #info 14:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible
10221:45:06 <robla> gwicke: do you think we should stop working on our PHP version of SVG validation?
10321:45:09 <yurik> robla, do you keep a list of potential usecases? Graphoid SVG output would be the first in line
10421:45:36 <yurik> (i meant i will be the first person to get into that line)
10521:46:14 <robla> yurik: I don't. This topic isn't *yet* an RFC, but if you want to turn this into an RFC, that'd be a great place to put your case on the first line
10621:46:14 <bawolff> Isnt graphoid svgs entirely machine generated (ie not attacker controlled)? Does it need general sanitation?
10721:47:00 <gwicke> robla: I think we should look at features / maturity of different options, and then evaluate a) which requires the least initial investment, and b) which will need the bigger long term maintenance effort on our end
10821:47:03 <yurik> bawolff, that was my understanding too, but for some reason csteip (i think) wanted to sanitize everything regardless of the source
10921:47:18 <yurik> "just in case" (tm)
11021:47:42 <robla> gwicke: I can't commit to writing an RFC for this, but that all seems reasonable. would you be up for writing it?
11121:47:46 <bawolff> Gwicke: do you know other options?
11221:48:05 <gwicke> I haven't checked the PHP code in a while, but my recollection from the last time I talked about this with csteipp was that it wasn't very complete for things like html-in-SVG
11321:48:24 <bawolff> DomPurify was the only one i could find that was even remotely good
11421:48:36 <yurik> the problem with that logic is that Vega is already enabled on the client, so if there is a security vulnerability, people are already exposed to it (i hope we caught all of them). So just adding SVG output does not seem to increase the attack surface
11521:48:41 <gwicke> bawolff: that's the only one we seriously considered back then
11621:49:23 <gwicke> but who knows, the web library world is changing every couple of months these days..
11721:49:56 <bawolff> Im intrigued by Tim's comment about porting to php was interesting. That honestly never even occured to me
11821:50:23 <bawolff> but im not sure about the library support in php for DOM
11921:50:42 <bawolff> which dompurify seems to use extensively
12021:51:04 <gwicke> https://github.com/OWASP/java-html-sanitizer looks potentially interesting
12121:51:25 <bawolff> perhaps its there. Im honestly not very familar with that area of the php ecosystem
12221:51:38 * robla creates an RFC page https://www.mediawiki.org/wiki/Requests_for_comment/SVG_Upload_should_(optionally)_allow_the_xhtml_namespace
12321:51:41 <gwicke> I don't think it supports SVG docs, however
12421:53:06 <robla> ok...we're coming to the end of the hour. This has been a really informative discussion
12521:53:32 <robla> ...and we have a place we can collaborate on prose ^^^
12621:53:57 <subbu> https://github.com/darylldoyle/svg-sanitizer
12721:54:14 <bawolff> Did we come to a conclusion about the original issue?
12821:54:42 * subbu only read the last few lines of this rfc since he got here late
12921:55:01 <robla> bawolff: I think the original issue was about should we allow XHTML at all, and I haven't heard any objections to it
13021:55:26 <gwicke> "just" need to solve the sanitization challenge..
13121:55:27 <robla> the conversation seems moved to "which validation library should we use?"
13221:55:43 <bawolff> Subbu: thats exactly what im looking for
13321:55:49 <robla> that question seems it will be solved after vi vs emacs ;-)
13421:56:24 <robla> seriously though, we have a bunch of validation options we should probably consider in the RFC writeup
13521:56:27 <gwicke> the second issue is that we'll need to upgrade our rasterization tools to support HTML
13621:56:51 <gwicke> we do have at least one option for that, but it's not rsvg
13721:56:52 <robla> gwicke: yeah, I don't think we'll have time to touch on that topic
13821:57:11 <robla> let's take the discussion back to phab. I'll take the action to summarize there
13921:57:12 <Scott_WUaS> Thanks for the discussion, All!
14021:57:33 <robla> any last thoughts before I hit #endmeeting?
14121:57:56 <robla> (I'll end this arbitrarily very close to the top of the hour)
14221:58:31 <robla> we should continue this conversation over on #wikimedia-multimedia maybe :-)
14321:58:44 <bawolff> Gwicke: i dont know if thats actually neccesary for the original use case
14421:58:47 <Scott_WUaS> gwicke: further thoughts about the sanitization issue?
14521:58:51 * robla regrets not more explicitly pinging them before this
14621:59:02 <robla> 60 second warning
14721:59:14 <bawolff> obviously it would make more sense
14821:59:52 <robla> alrighty....thank you everyone! No meeting planned next week, and the following week has an option or two
14922:00:01 <Scott_WUaS> Thank you!
15022:00:07 <robla> #endmeeting

People present (lines said)

  • robla (45)
  • bawolff (32)
  • TimStarling (27)
  • gwicke (13)
  • DanielK_WMDE_ (9)
  • yurik (7)
  • Scott_WUaS (4)
  • SMalyshev (4)
  • wm-labs-meetbot (3)
  • stashbot (2)
  • subbu (2)
  • AaronSchulz (1)
  • yurik_ (1)

Other meetings

Architecture meetings
13:00 PT ArchCom Planning Meetingsupcomingall since 2016-03-30
14:00 PT ArchCom-RFC Meetingsupcomingall since 2015-09-09

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

RobLa-WMF renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting W43: Allow HTML in SVG? (2016-10-26, #wikimedia-office).Oct 21 2016, 8:57 PM
RobLa-WMF updated the event description. (Show Details)
RobLa-WMF changed the end date for this event from Oct 26 2016, 9:00 PM to Oct 26 2016, 10:00 PM.
daniel renamed this event from ArchCom RFC Meeting W43: Allow HTML in SVG? (2016-10-26, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
ssastry renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting W43: Allow HTML in SVG? (2016-10-26, #wikimedia-office).Nov 30 2016, 5:04 PM
ssastry changed the start date for this event from Oct 26 2016, 9:00 PM to Oct 26 2016, 9:00 PM.
ssastry changed the end date for this event from Oct 26 2016, 10:00 PM to Oct 26 2016, 10:00 PM.