Agenda
- Location: #wikimedia-office IRC channel
- Meeting type: Problem definition
- Time: 2016-10-26, Wednesday 21:00 UTC (2pm PDT, 23:00 CEST)
- Topic:
Meeting summary
- T138783 - SVG in HTML (robla, 21:01:52)
- T138783 - HTML in SVG (robla, 21:02:06)
- question discussed: should we use the Sanitizer class as HTML validation library for SVG? (robla, 21:15:44)
- question discussed: should MediaWiki be able to alter SVG uploads on save? (robla, 21:19:47)
- bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult (robla, 21:24:51)
- TimStarling would like there to be an image link option which does client-side SVG rendering (TimStarling, 21:26:46)
- 14:27:41Â <bawolff>Â If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify (robla, 21:28:09)
- LINK: https://github.com/cure53/DOMPurify (robla, 21:28:57)
- question discussed: is DOMPurify an improvement over the status quo (robla, 21:35:15)
- 14:42:58Â <gwicke>Â I definitely think that we should leverage other people's work in this space as much as possible (robla, 21:43:36)
- LINK: https://github.com/OWASP/java-html-sanitizer looks potentially interesting (gwicke, 21:51:04)
- LINK: https://github.com/darylldoyle/svg-sanitizer (subbu, 21:53:57)
Meeting ended at 22:00:07 UTC.
Log
| 1 | 21:01:32 <robla> #startmeeting ArchCom IRC meeting |
|---|---|
| 2 | 21:01:32 <wm-labs-meetbot> Meeting started Wed Oct 26 21:01:32 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. |
| 3 | 21:01:32 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. |
| 4 | 21:01:32 <wm-labs-meetbot> The meeting name has been set to 'archcom_irc_meeting' |
| 5 | 21:01:52 <robla> #topic T138783 - SVG in HTML |
| 6 | 21:01:52 <stashbot> T138783: SVG Upload should (optionally) allow the xhtml namespace - https://phabricator.wikimedia.org/T138783 |
| 7 | 21:02:06 <robla> #topic T138783 - HTML in SVG |
| 8 | 21:02:09 <robla> :-) |
| 9 | 21:02:14 <robla> hi folks! |
| 10 | 21:03:06 <robla> so....today's topic is about whether we should allow HTML in our SVG |
| 11 | 21:03:52 * robla goes to find more links to describe this better.... |
| 12 | 21:04:45 <robla> <https://lists.wikimedia.org/pipermail/wikitech-l/2016-October/086861.html> |
| 13 | 21:05:44 <robla> matmarex suggested "We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML" |
| 14 | 21:05:52 <robla> does that seem like a good idea? |
| 15 | 21:06:26 <TimStarling> MatmaRex is not in this channel and is marked /away |
| 16 | 21:06:35 <TimStarling> but maybe we have bawolff? |
| 17 | 21:06:42 <bawolff> Yep |
| 18 | 21:07:16 <DanielK_WMDE_> i think it's important to differentiate between a) should mediawiki support this at all and b) should this be allowed/used on wikimedia sites |
| 19 | 21:07:45 <DanielK_WMDE_> also, what html is typically encountered in that context, and how well is that supported by different renderers? |
| 20 | 21:08:13 <bawolff> as i wrote on the bug, seems in theory workable but im concerned the sanitizer wants to rewrite during sanitization which means youd have to be very pedantic in writing your svgs |
| 21 | 21:08:41 <bawolff> unless we change the upload pipeline to allow files to be modified during upload |
| 22 | 21:08:47 <TimStarling> as for (a), if there was a patch in gerrit to allow HTML behind a feature flag, it would presumably be accepted, we don't have a massively high bar for that sort of thing, right? |
| 23 | 21:09:54 <robla> TimStarling: I think the bar is "is this secure?" |
| 24 | 21:09:58 <TimStarling> yeah, you would have trouble using sanitizer directly |
| 25 | 21:10:25 <DanielK_WMDE_> TimStarling: or more nicely, some dort of pluggable validation thingy, yea. |
| 26 | 21:10:29 <TimStarling> I'm not really a big fan of Sanitizer in terms of the way the code is currently structured |
| 27 | 21:10:33 <SMalyshev> I'd guess the security depends on the sanitizer? |
| 28 | 21:10:38 <DanielK_WMDE_> then this could be an extension rather than a feature flag |
| 29 | 21:11:12 <DanielK_WMDE_> SMalyshev: also on the consumer of the html. currently, it's us rendering these, not the browser. |
| 30 | 21:11:32 <SMalyshev> you mean SVGs? |
| 31 | 21:11:45 <robla> what originally started this conversation was a discussion about a draw.io plugin where the feature flag wanted was "turn off sanitization" |
| 32 | 21:11:51 <DanielK_WMDE_> yes. well, the html inside the svgs |
| 33 | 21:11:56 <SMalyshev> ah yes if we have SVG renderer we also need to see what's going on there |
| 34 | 21:12:26 <TimStarling> according to bawolff, our SVG renderer does not support HTML, you'll just get an empty box |
| 35 | 21:12:26 <bawolff> They were using svg foreign objecty things |
| 36 | 21:12:56 <bawolff> so there may be fallback for renders not supporting it |
| 37 | 21:12:56 <TimStarling> which is not surprising, it's a very complicated spec to pull in |
| 38 | 21:13:56 * DanielK_WMDE_ wants interactive/scripted svg |
| 39 | 21:14:03 <TimStarling> an idea I had last hour: maybe we could have an option in the image link which enables client-side SVG rendering |
| 40 | 21:14:24 <bawolff> whether or not rsvg properly handles the fallback i have no idea as i havent tested that |
| 41 | 21:14:43 <TimStarling> you know there are performance reasons to do SVG in the server at the moment |
| 42 | 21:14:47 <DanielK_WMDE_> TimStarling: sounds ok for 3rd parties. for wikimedia, we'd need a decent fallback renderer |
| 43 | 21:14:55 <bawolff> DanielK_WMDE: well thats a whole other bag of worms :) |
| 44 | 21:15:08 <SMalyshev> if sanitizer follows https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext is seems ok |
| 45 | 21:15:44 <robla> #info question discussed: should we use the Sanitizer class as HTML validation library for SVG? |
| 46 | 21:15:49 <bawolff> Sanitizer is what the parser itself uses so it should |
| 47 | 21:15:59 <DanielK_WMDE_> SMalyshev: i think that page is basically documenting what the sanitizer does |
| 48 | 21:16:57 <TimStarling> Sanitizer::removeHTMLtags() is currently only used by the parser, so it probably does some things which are wikitext-specific |
| 49 | 21:17:23 <TimStarling> and like bawolff previously said, it works by rewriting dubious HTML to be less dubious |
| 50 | 21:17:32 <bawolff> The translated entity names for example are wikitext specific |
| 51 | 21:17:41 <TimStarling> so you either have to allow it to alter uploads or have a strict rejection mode |
| 52 | 21:18:15 <robla> (side note: please feel free to add "#info" tag notes to get captured in the summary of this discussion) |
| 53 | 21:19:01 <bawolff> Allowing mediawiki to alter uploads would make a bunch of things unrelated to this much easier |
| 54 | 21:19:03 <TimStarling> for example, unquoted attribute values are rewritten to be double-quoted, which avoids relying on client parsing details |
| 55 | 21:19:46 <TimStarling> there's nothing really stopping us from doing it, right? |
| 56 | 21:19:47 <robla> #info question discussed: should MediaWiki be able to alter SVG uploads on save? |
| 57 | 21:20:29 <bawolff> I think some code has to be rearchitected, but not a major amount |
| 58 | 21:20:53 <TimStarling> AaronSchulz: does it sound difficult to you? |
| 59 | 21:21:03 <bawolff> The upload code is something a lot of people find scary and dont like touching |
| 60 | 21:21:31 <bawolff> but i dont thimk it would be very hard |
| 61 | 21:22:20 <AaronSchulz> bawolff: yeah, probably |
| 62 | 21:23:00 <TimStarling> UploadBase::performUpload() just uploads $this->mTempPath |
| 63 | 21:23:19 <TimStarling> it's not even trying to move the file from the stash, it just reuploads |
| 64 | 21:23:39 <TimStarling> so if you modify that file, you're all done, right? |
| 65 | 21:24:17 <bawolff> I imagine you would want to also give the user a warning |
| 66 | 21:24:31 <TimStarling> sure |
| 67 | 21:24:51 <robla> #info bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult |
| 68 | 21:26:46 <TimStarling> #info TimStarling would like there to be an image link option which does client-side SVG rendering |
| 69 | 21:27:41 <bawolff> If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify |
| 70 | 21:28:09 <robla> #info 14:27:41 <bawolff> If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify |
| 71 | 21:28:22 <robla> thoughts on that? |
| 72 | 21:28:36 * robla goes to find the DomPurify link |
| 73 | 21:28:38 <bawolff> slightly complicated by that project being in js, but thats probably not an issue for wmf at least |
| 74 | 21:28:57 <robla> #link https://github.com/cure53/DOMPurify |
| 75 | 21:29:14 <bawolff> and i think we would be better using someone elses sanitizer than our own |
| 76 | 21:31:09 <bawolff> Since more eyes on their sanitizer than ours |
| 77 | 21:31:11 <robla> bawolff: you're suggesting that the Wikimedia instances would rely on server-side Javascript (i.e. Node.js). do you think it makes sense to require Node.js for secure SVG? |
| 78 | 21:31:31 <robla> (for third parties? |
| 79 | 21:31:37 <yurik_> hi, sorry i'm late to SVG party. DanielK_WMDE_ I second your desire for interactive SVGs :) Graphs at this point can produce SVG output, but graphs does not (yet) support an ability to create URL links |
| 80 | 21:31:54 <bawolff> i wouldnt say im happy about it... |
| 81 | 21:32:04 <TimStarling> that library does not look like an improvement |
| 82 | 21:32:38 <Scott_WUaS> (hi All) |
| 83 | 21:33:04 <yurik> so if graphoid was allowed to produce SVG output, it would already be a quality improvement, but not necessarily a functional improvement right away |
| 84 | 21:33:39 <yurik> but at least it gives a path forward, whereas without SVG, we have no easy way of producing clickable graphs |
| 85 | 21:33:59 <robla> legoktm pointed out T86874 on wikitech-l |
| 86 | 21:33:59 <stashbot> T86874: Make SVG sanitization into a library - https://phabricator.wikimedia.org/T86874 |
| 87 | 21:34:35 <bawolff> TimStarling: can you be more specific? I havent looked at that library in detail im more coming from a i dont like our current approach |
| 88 | 21:35:15 <robla> #info question discussed: is DOMPurify an improvement over the status quo |
| 89 | 21:35:53 <TimStarling> for a start, it's only 860 physical lines of code, so if you wanted it you could just port it to PHP |
| 90 | 21:36:51 <TimStarling> it's apparently just an XSS filter, not a cross-site request filter |
| 91 | 21:38:12 <bawolff> Yes, there would have to be a layer on top for parts outside their security model but inside ours |
| 92 | 21:38:23 <TimStarling> I don't know how well it has been reviewed |
| 93 | 21:39:18 <gwicke> IIRC csteipp viewed DOMPurify favorably, but we didn't manage to finish the review before he left |
| 94 | 21:39:25 <TimStarling> it's a lot more permissive than Sanitizer |
| 95 | 21:40:48 <gwicke> agreed on the need to layer our specific restrictions on top |
| 96 | 21:41:23 <TimStarling> var IS_ALLOWED_URI = /^(?:(?:(?:f|ht)tps?|mailto|tel):|[^a-z]|[a-z+.\-]+(?:[^a-z+.\-:]|$))/i; |
| 97 | 21:41:36 <gwicke> parsoid's sanitizer works at the token / sax level, so it shouldn't be all that hard to wrap it into a SAX visitor |
| 98 | 21:41:45 <TimStarling> I've got to go |
| 99 | 21:42:41 <robla> TimStarling: d'oh! ok....thanks for the discussion! |
| 100 | 21:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible |
| 101 | 21:43:36 <robla> #info 14:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible |
| 102 | 21:45:06 <robla> gwicke: do you think we should stop working on our PHP version of SVG validation? |
| 103 | 21:45:09 <yurik> robla, do you keep a list of potential usecases? Graphoid SVG output would be the first in line |
| 104 | 21:45:36 <yurik> (i meant i will be the first person to get into that line) |
| 105 | 21:46:14 <robla> yurik: I don't. This topic isn't *yet* an RFC, but if you want to turn this into an RFC, that'd be a great place to put your case on the first line |
| 106 | 21:46:14 <bawolff> Isnt graphoid svgs entirely machine generated (ie not attacker controlled)? Does it need general sanitation? |
| 107 | 21:47:00 <gwicke> robla: I think we should look at features / maturity of different options, and then evaluate a) which requires the least initial investment, and b) which will need the bigger long term maintenance effort on our end |
| 108 | 21:47:03 <yurik> bawolff, that was my understanding too, but for some reason csteip (i think) wanted to sanitize everything regardless of the source |
| 109 | 21:47:18 <yurik> "just in case" (tm) |
| 110 | 21:47:42 <robla> gwicke: I can't commit to writing an RFC for this, but that all seems reasonable. would you be up for writing it? |
| 111 | 21:47:46 <bawolff> Gwicke: do you know other options? |
| 112 | 21:48:05 <gwicke> I haven't checked the PHP code in a while, but my recollection from the last time I talked about this with csteipp was that it wasn't very complete for things like html-in-SVG |
| 113 | 21:48:24 <bawolff> DomPurify was the only one i could find that was even remotely good |
| 114 | 21:48:36 <yurik> the problem with that logic is that Vega is already enabled on the client, so if there is a security vulnerability, people are already exposed to it (i hope we caught all of them). So just adding SVG output does not seem to increase the attack surface |
| 115 | 21:48:41 <gwicke> bawolff: that's the only one we seriously considered back then |
| 116 | 21:49:23 <gwicke> but who knows, the web library world is changing every couple of months these days.. |
| 117 | 21:49:56 <bawolff> Im intrigued by Tim's comment about porting to php was interesting. That honestly never even occured to me |
| 118 | 21:50:23 <bawolff> but im not sure about the library support in php for DOM |
| 119 | 21:50:42 <bawolff> which dompurify seems to use extensively |
| 120 | 21:51:04 <gwicke> https://github.com/OWASP/java-html-sanitizer looks potentially interesting |
| 121 | 21:51:25 <bawolff> perhaps its there. Im honestly not very familar with that area of the php ecosystem |
| 122 | 21:51:38 * robla creates an RFC page https://www.mediawiki.org/wiki/Requests_for_comment/SVG_Upload_should_(optionally)_allow_the_xhtml_namespace |
| 123 | 21:51:41 <gwicke> I don't think it supports SVG docs, however |
| 124 | 21:53:06 <robla> ok...we're coming to the end of the hour. This has been a really informative discussion |
| 125 | 21:53:32 <robla> ...and we have a place we can collaborate on prose ^^^ |
| 126 | 21:53:57 <subbu> https://github.com/darylldoyle/svg-sanitizer |
| 127 | 21:54:14 <bawolff> Did we come to a conclusion about the original issue? |
| 128 | 21:54:42 * subbu only read the last few lines of this rfc since he got here late |
| 129 | 21:55:01 <robla> bawolff: I think the original issue was about should we allow XHTML at all, and I haven't heard any objections to it |
| 130 | 21:55:26 <gwicke> "just" need to solve the sanitization challenge.. |
| 131 | 21:55:27 <robla> the conversation seems moved to "which validation library should we use?" |
| 132 | 21:55:43 <bawolff> Subbu: thats exactly what im looking for |
| 133 | 21:55:49 <robla> that question seems it will be solved after vi vs emacs ;-) |
| 134 | 21:56:24 <robla> seriously though, we have a bunch of validation options we should probably consider in the RFC writeup |
| 135 | 21:56:27 <gwicke> the second issue is that we'll need to upgrade our rasterization tools to support HTML |
| 136 | 21:56:51 <gwicke> we do have at least one option for that, but it's not rsvg |
| 137 | 21:56:52 <robla> gwicke: yeah, I don't think we'll have time to touch on that topic |
| 138 | 21:57:11 <robla> let's take the discussion back to phab. I'll take the action to summarize there |
| 139 | 21:57:12 <Scott_WUaS> Thanks for the discussion, All! |
| 140 | 21:57:33 <robla> any last thoughts before I hit #endmeeting? |
| 141 | 21:57:56 <robla> (I'll end this arbitrarily very close to the top of the hour) |
| 142 | 21:58:31 <robla> we should continue this conversation over on #wikimedia-multimedia maybe :-) |
| 143 | 21:58:44 <bawolff> Gwicke: i dont know if thats actually neccesary for the original use case |
| 144 | 21:58:47 <Scott_WUaS> gwicke: further thoughts about the sanitization issue? |
| 145 | 21:58:51 * robla regrets not more explicitly pinging them before this |
| 146 | 21:59:02 <robla> 60 second warning |
| 147 | 21:59:14 <bawolff> obviously it would make more sense |
| 148 | 21:59:52 <robla> alrighty....thank you everyone! No meeting planned next week, and the following week has an option or two |
| 149 | 22:00:01 <Scott_WUaS> Thank you! |
| 150 | 22:00:07 <robla> #endmeeting |
People present (lines said)
- robla (45)
- bawolff (32)
- TimStarling (27)
- gwicke (13)
- DanielK_WMDE_ (9)
- yurik (7)
- Scott_WUaS (4)
- SMalyshev (4)
- wm-labs-meetbot (3)
- stashbot (2)
- subbu (2)
- AaronSchulz (1)
- yurik_ (1)
Other meetings
| Architecture meetings | ||
|---|---|---|
| 13:00 PT ArchCom Planning Meetings | upcoming | all since 2016-03-30 |
| 14:00 PT ArchCom-RFC Meetings | upcoming | all since 2015-09-09 |