Page MenuHomePhabricator

SVG Upload should (optionally) allow the xhtml namespace
Open, MediumPublic

Description

This RFC explores having MediaWiki allow SVG files to have XHTML namespaces on upload. This was disabled as a result of T62771: SVG iframe XSS. However, legitimate SVG files are also blocked, e.g. the ones produced by draw.io.

Native SVG lacks proper word wrapping support, at least in current implementations, and this can be circumvented by using xhtml and CSS if the displaying application supports that. In case it does not, the less optimal SVG version will be used.

Should be an administrator decision, i.e. a configuration option to turn the xhtml namespace on or off, or one to extend the list of allowed namespaces?

Full RFC: https://www.mediawiki.org/wiki/Requests_for_comment/SVG_Upload_should_(optionally)_allow_the_xhtml_namespace

Event Timeline

Restricted Application added subscribers: Zppix, Steinsplitter, Aklapper. · View Herald Transcript

@mgebert , the mediawiki-drawio-editor looks like a really useful extension! @dpatrick , @Bawolff , and I spent a few minutes talking this over in our Security triage. We're not convinced that we can make an "allow XHTML" option for our SVG sanitization robust enough to be secure, so we're afraid of supporting that option.

Would it be possible for mediawiki-drawio-editor to strip the XHTML? My understanding is that the XHTML from draw.io is usually wrapped in a "switch" and then "foreignObject" element, with the SVG text element as an alternate. Is that something your plugin could strip out prior to offering it to MediaWiki?

@RobLa-WMF, I completely unterstand that you're having security concerns, although I think they're only valid for wiki installations with untrusted users. In a closed setup where only trusted people can upload, these concerns may be a minimal risk or even a non-issue, and the minimal checks you currently do (i.e. no iframe) may be more than good enough. In fact I've originally created the extension for such an environment.

That's why I suggested to leave this up to the administrator. Of course the documentation of such an option should clearly state the risk that comes with allowing xhtml. It could also be a generic option to add any namespace to the predefined list of allowed ones, again stating it's the admins responsibility to consider the impact on his installation.

Still, I've looked for a solution to strip xhtml in my extension (which has to be able to do this entirely in javascript):

https://jsfiddle.net/n5qL9mro/

As it turns out it's technically doable, but as you can see the "after" SVG lacks the text in the box. The problem is that the original SVG used in this example lacks the <switch> element. Creating this jsfiddle made me realize that draw.io does not output a <switch> statements in all cases. I don't know why, but as long as this is the case, stripping xhtml is pointless.

Even if draw.io were to output <switch> statements in any case, I'm not really convinced that solving this in my extension is appropriate, because:

  • It's not a solution for users that use draw.io directly.
  • A change on draw.io's side could break this at any time, unexpectedly.
  • An inferior version of the SVG would be displayed in the wiki (no word wrapping).

We could try and convince the draw.io folks that they should refrain from using xhtml in SVG, but I'm not convinced myself that xhtml is generally bad. That's why I haven't contacted them so far.

The other option is to improve MediaWiki's SVG sanitization to the point at which you're confident it's safe to allow the xhtml namespace (either opt-in or opt-out). As this would be rather complicated and it'd be hard to decide when the sanitization is actually good enough, I still suggest to make xhtml opt-in for the administrator in the meantime, just to have a good solution for all the installations in which this is an acceptable risk.

@RobLa-WMF, I completely unterstand that you're having security concerns, although I think they're only valid for wiki installations with untrusted users. In a closed setup where only trusted people can upload, these concerns may be a minimal risk or even a non-issue, and the minimal checks you currently do (i.e. no iframe) may be more than good enough.

Wikimedia developers can't (and shouldn't) spend a lot of their time making features that work when everyone trusts each other. We spend a lot of time cleaning up messes created by naive developers that assume trust levels they shouldn't. Sometimes the "naive developer" is "a seasoned developer several years ago before they learned a few things the hard way" ;-)

In fact I've originally created the extension for such an environment.

Do you think your extension should be useful for public wikis too?

That's why I suggested to leave this up to the administrator. Of course the documentation of such an option should clearly state the risk that comes with allowing xhtml. It could also be a generic option to add any namespace to the predefined list of allowed ones, again stating it's the admins responsibility to consider the impact on his installation.

Allowing this tradeoff then adds complexity for those of us trying to maintain the MediaWiki codebase and forces administrators to make a decision that they shouldn't need to make. Whose life does it make easier?

As it turns out [stripping out the XHTML is] technically doable, but as you can see the "after" SVG lacks the text in the box. The problem is that the original SVG used in this example lacks the <switch> element. Creating this jsfiddle made me realize that draw.io does not output a <switch> statements in all cases.

That's really unfortunate. It would be really good to understand why draw.io insists on XHTML.

We could try and convince the draw.io folks that they should refrain from using xhtml in SVG, but I'm not convinced myself that xhtml is generally bad. That's why I haven't contacted them so far.

It would be in all of our best interests if you did. I agree with you that inline XHTML isn't inherently bad, but validating that it isn't bad in an automated fashion is beyond the scope of what Wikimedia-funded developers should drop other work to spend time on.

The other option is to improve MediaWiki's SVG sanitization to the point at which you're confident it's safe to allow the xhtml namespace (either opt-in or opt-out). As this would be rather complicated and it'd be hard to decide when the sanitization is actually good enough,

It's complicated, but seems worthwhile. Our SVG sanitization library for PHP is pretty good (from what I understand), and it would be wonderful if someone (you?) took ownership of that library and added features that made it safely compatible with services like draw.io. The sanitization library we use is only used in MediaWiki, but there's nothing about it that is specific to MediaWiki. All PHP developers might benefit from an improved version.

@RobLa-WMF, I completely unterstand that you're having security concerns, although I think they're only valid for wiki installations with untrusted users. In a closed setup where only trusted people can upload, these concerns may be a minimal risk or even a non-issue, and the minimal checks you currently do (i.e. no iframe) may be more than good enough.

Wikimedia developers can't (and shouldn't) spend a lot of their time making features that work when everyone trusts each other. We spend a lot of time cleaning up messes created by naive developers that assume trust levels they shouldn't. Sometimes the "naive developer" is "a seasoned developer several years ago before they learned a few things the hard way" ;-)

Well, I'm certainly not asking you to do something naive ;-). Of course I agree that features should be designed to work in untrustworthy environments. But this is not about adding a feature, this is about adding a configuration option that causes a check to be skipped or conducted in a different way.

In fact I've originally created the extension for such an environment.

Do you think your extension should be useful for public wikis too?

It think it should be and actually it is already useful for public wikis if you're ok with using PNG instead of SVG.

All this really is about the SVG draw.io produces vs. what MediaWiki is willing to accept. So in my opinion the extension does what it should do and the design of it does not stand in the way of it being used in public wikis.

When open-sourcing the extension, I did not do it with the goal for it to work for everyone. I wrote it for $work where we run some internal wikis. At some point I thought it might be useful for somebody else in the same situation and I've put in on GitHub with some documentation. As you probably have noticed, I'd like to see SVG work for any kind of installation, but ultimately the effort I can put into this is quite limited.

That's why I suggested to leave this up to the administrator. Of course the documentation of such an option should clearly state the risk that comes with allowing xhtml. It could also be a generic option to add any namespace to the predefined list of allowed ones, again stating it's the admins responsibility to consider the impact on his installation.

Allowing this tradeoff then adds complexity for those of us trying to maintain the MediaWiki codebase and forces administrators to make a decision that they shouldn't need to make. Whose life does it make easier?

The administrator's life. Because adding a line to LocalSettings.php once is easier than applying a patch on every update. Of course this only helps for private wiki's and those with trusted uploaders.

As it turns out [stripping out the XHTML is] technically doable, but as you can see the "after" SVG lacks the text in the box. The problem is that the original SVG used in this example lacks the <switch> element. Creating this jsfiddle made me realize that draw.io does not output a <switch> statements in all cases.

That's really unfortunate. It would be really good to understand why draw.io insists on XHTML.

We could try and convince the draw.io folks that they should refrain from using xhtml in SVG, but I'm not convinced myself that xhtml is generally bad. That's why I haven't contacted them so far.

It would be in all of our best interests if you did. I agree with you that inline XHTML isn't inherently bad, but validating that it isn't bad in an automated fashion is beyond the scope of what Wikimedia-funded developers should drop other work to spend time on.

I'll try to contact them.

The other option is to improve MediaWiki's SVG sanitization to the point at which you're confident it's safe to allow the xhtml namespace (either opt-in or opt-out). As this would be rather complicated and it'd be hard to decide when the sanitization is actually good enough,

It's complicated, but seems worthwhile. Our SVG sanitization library for PHP is pretty good (from what I understand), and it would be wonderful if someone (you?) took ownership of that library and added features that made it safely compatible with services like draw.io. The sanitization library we use is only used in MediaWiki, but there's nothing about it that is specific to MediaWiki. All PHP developers might benefit from an improved version.

I'm not very familiar with the upload code. Is this "library" something in UploadBase.php or is it separate module somewhere?

As stated above, my resources are very limited too and as much as I'd like to help out, improving that code or even taking ownership is currently not feasible. I'm already putting more effort into this than originally intended.

Whatever answer draw.io is going to give us will probably change little about the fact that we both don't have the resources to improve the SVG sanitization right now. So the question that still stands is wether you're willing to give the trusted wiki's administrator a short-term solution by adding a you-need-to-know-what-you're-doing config option that controls the allowed namespaces? There are probably other use cases than this one where an administrator needs to add a namespace you do not have on your whitelist (yet).

Just FYI: I've opened a request at draw.io's support site and asked them to join this discussion. The ticket itself is private so I can't provide a link, but I'll keep you posted.

Just FYI: I've opened a request at draw.io's support site and asked them to join this discussion. The ticket itself is private so I can't provide a link, but I'll keep you posted.

Wonderful, thank you! In particular, it'd be good to know what elements and attributes we should expect from draw.io-generated SVG. I'm assuming that what's really expected here is:

  • SVG foreignObject which wraps...
  • an embedded HTML div (optionally having nested divs) Each div has:
    • A style attribute, with allows some subset of inline CSS

Is this embedded HTML something we should expect to increasingly see from other SVG editors, or is this a peculiarity of draw.io's implementation? How widely supported is draw.io's version of SVG+HTML in other SVG rendering environments?

Our SVG sanitization library for PHP is pretty good (from what I understand), and it would be wonderful if someone (you?) took ownership of that library and added features that made it safely compatible with services like draw.io. The sanitization library we use is only used in MediaWiki, but there's nothing about it that is specific to MediaWiki. All PHP developers might benefit from an improved version.

I'm not very familiar with the upload code. Is this "library" something in UploadBase.php or is it separate module somewhere?

Yup, you're probably right to put that in scare quotes; I admittedly was overoptimistic in referring to it as a "library". @csteipp and I discussed turning it into one when @bd808 was working on the librarization project, but I wouldn't be surprised if it never made it out of the "wouldn't it be nice" stage. Still, patches to this code that open limited exceptions to our current strictness would be welcome. We might be able to agree on some very tiny subset of embedded HTML that is safe and reasonable to let through in SVG.

draw.io uses foreign objects (FO) to embed arbitrary HTML labels within SVG shapes. It's partly to allow users a wide scope of flexibility as to what they can create within a shape, but a lot of the reason is to give us word wrapping, which is very hard in native SVG text.

In terms of an SVG sanitiser, that's certainly one option. Obviously, the sanitiser has to be server-side, it makes no sense for us to sanitise and then output a text file that anyone could change.

We could write a draw.io plugin (that you'd invoke as a URL parameter when calling draw) when using draw in embed mode. That could switch off all UI functionality that causes HTML labels to be created, like word wrapping. That would then produce output with no FO sections. That doesn't solve the case of a user taking output from the online site, though, and restricts an important piece of functionality.

Another option is we have a viewer "widget" that can be embedded. That gives you vector, zoomable display of the diagram. You would need to store the diagram XML format rather than the SVG in this case, but I'm guessing you store that anyway if your extension round-trips diagrams (you'll have to excuse my zero knowledge of either MediaWiki or your extension).

I would add that we do perform complete HTML sanitisation on displaying HTML labels, both in the editor and in the viewer.

Is this embedded HTML something we should expect to increasingly see from other SVG editors, or is this a peculiarity of draw.io's implementation? How widely supported is draw.io's version of SVG+HTML in other SVG rendering environments?

Foreign Objects with SVG are part of the SVG specification, https://developer.mozilla.org/en/docs/Web/SVG/Element/foreignObject. It is optional in SVG 1.1, but it's marked as required in SVG 2. The most common use case is to embed HTML. All browsers support it, except for IE (Edge does support it).

We could write a draw.io plugin (that you'd invoke as a URL parameter when calling draw) when using draw in embed mode. That could switch off all UI functionality that causes HTML labels to be created, like word wrapping. That would then produce output with no FO sections. That doesn't solve the case of a user taking output from the online site, though, and restricts an important piece of functionality.

The examples drawings we've produced so far only contain a box with a label but already make use of xhtml. Turning of boxes with labels sounds very limiting.

Generally speaking, this might be a last resort kind of solution for the extension, but as you say, it won't solve the problem that draw.io exports cannot be uploaded to MediaWiki. I think the primary goal should be to solve that.

Another option is we have a viewer "widget" that can be embedded. That gives you vector, zoomable display of the diagram. You would need to store the diagram XML format rather than the SVG in this case, but I'm guessing you store that anyway if your extension round-trips diagrams (you'll have to excuse my zero knowledge of either MediaWiki or your extension).

Depending on the configuration and and the wiki tag that represents the diagram, the extension either exports xmlpng or xmlsvg from the draw.io iframe and uploads that to the wiki file store. As the xml prefix causes the original diagram data to be embedded into the file, there's currently no need to store a separate XML version. I the user wants to edit again, the file is pulled from the wiki, imported into a draw.io iframe that is created on the fly and injected into the wiki page.

I tried to find information about this viewer. Is this related to what the UI provides in File -> Embed -> IFrame...? Unfortunately, I did not get that to work with my sample xmlsvg file. But the browser does not even try to fetch the file. Is this fetched server-side? "Public URL of the diagram:" in the UI suggests that too. My test file is in a local wiki.

And again, the viewer could maybe solve the problem for the extension, but not for MediaWiki in general.

As far as the extension is concerned, I'm not sure the benefits of the viewer (vector, zooming, which are things SVG does natively) outweigh the costs like depending on draw.io to be available even for just displaying diagrams. Unless of course I'm missing some benefits of the viewer...

Quoted Text Turning of boxes with labels sounds very limiting.

It doesn't turn off labels, it turns off HTML labels. You'd still have SVG text labels and basic formating; fonts, bold, italic, underline, label positioning. You'd lose complete HTML formatting and word wrap (though you could manually add line breaks).

We'd probably need to work with you on the viewer, it'd need a little customizing for your exact scenario. But if the viewer isn't the solution, that issue is moot. The viewer can open to a larger lightbox view, it has an option to select diagram layers. There's options for tooltips, links (including page anchors) and expanding/collapsing groups. You could show the PNG by default and attempt to load the viewer over it. If you went down this route I wouldn't hold up the page load, I'd detect a mouseover into the image (sorry touch devices) and attempt to load the viewer in at that point (i.e. when they might want to interact with the diagram).

Another option would be to make a simple wrapper around the SVG which includes a JS HTML santizer ( I believe we use the Caja version ).

We seem to have a couple sub-issues here:

Sanitization options:

  • always pass a safe subset of XHTML as foreign objects through the validator
  • allowing sites to configure the SVG sanitizer to pass a believed-safe subset of embedded XHTML
  • allowing sites to configure the SVG sanitizer to allow *any* XHTML through (unsafe for untrusted users)

Viewing/rendering issues:

  • Do the HTML labels work when rendered with rsvg, ImageMagick, or batik instead of a browser?
  • option of a draw.io-based viewer
  • separately, the option of native in-browser SVG viewing

Concentrating on the sanitization, it would probably help to have a representative set of real-world sample files (not just whatever I might create adding a random label, but files y'all are using to demonstrate actual requirements). If you just have a <div> with some plain text in it, I don't expect a problem. If you have style attributes, things get more complex, but we already have to sanitize CSS for the SVG and in our wikitext->HTML conversion, so I suspect it can be done. :)

It sounds you like don't require scripting or off-site objects or anything within the HTML, which makes it a bit easier to envision this working.

I don't think a specialized viewer is a required thing at all; for viewing it sounds like you might need in-browser rendering if offline flattening isn't working. I'm not sure I saw anything clear on whether there's a rendering problem currently, or if testing if rendering works is stalled on the upload sanitization, or if it works fine in your testing so far?

We seem to have a couple sub-issues here:

Sanitization options:

  • always pass a safe subset of XHTML as foreign objects through the validator
  • allowing sites to configure the SVG sanitizer to pass a believed-safe subset of embedded XHTML
  • allowing sites to configure the SVG sanitizer to allow *any* XHTML through (unsafe for untrusted users)

Yes, these are the options and I also think the discussion should focus on them and not the viewing/rendering part (see below).

Viewing/rendering issues:

  • Do the HTML labels work when rendered with rsvg, ImageMagick, or batik instead of a browser?
  • option of a draw.io-based viewer
  • separately, the option of native in-browser SVG viewing

Although I'm tempted to add some sort of draw.io viewer to the extension at some point, that would be in addition to native SVG display. The only reason @bensond-drawio brought the viewer into play in this context was that it could work around the wiki not allowing draw.io generated SVG uploads because it could be backed by native draw.io XML files or maybe even PNG files with embedded draw.io XML, which should be easier to get past the upload checks. So the main problem of not getting the draw.io SVG uploaded in the first place remains and should be the priority. Viewing is a different set of problems, but if you only count in modern browsers, native SVG with or without XHTML works just fine and that will hopefully only improve over time.

Concentrating on the sanitization, it would probably help to have a representative set of real-world sample files (not just whatever I might create adding a random label, but files y'all are using to demonstrate actual requirements).

Unfortunately I cannot disclose the diagrams I've made for $work, but then again they only use basic boxes and some labels, so just randomly adding these to a new drawing comes pretty close anyway ;-). An quick google has not turned up some fancy examples. I hope that @bensond-drawio can come up with something useful here.

I don't think a specialized viewer is a required thing at all; for viewing it sounds like you might need in-browser rendering if offline flattening isn't working.

I'm not sure what you refer to by offline flattening. Do you mean imagemagick convert it server-side to something like PNG?

I'm not sure I saw anything clear on whether there's a rendering problem currently, or if testing if rendering works is stalled on the upload sanitization, or if it works fine in your testing so far?

In my opinion, there's no rendering problem here. Work-arounds like the draw.io viewer have only been suggested so that SVG uploads can be circumvented. It they worked, most or all of the problems discussed here would be solved at once.

I'm not a draw.io user but i have the same limitation to upload svg files with foreign objects.

I suggest a real example : https://github.com/mothsART/labodunerd/blob/master/labo/frontend/interactive_svg/campement.min.foreign.svg

I precise, than the foreign object is encapsulated in a <switch> parent with an alternate <text> node when foreign objects are not supported.

I don't really understand the restriction on "http://www.w3.org/1999/xhtml" namespace except in the <iframe> context => treated on
https://github.com/wikimedia/mediawiki/blob/master/includes/upload/UploadBase.php#L1475-L1480 ?

I'm not a draw.io user but i have the same limitation to upload svg files with foreign objects.

I suggest a real example : https://github.com/mothsART/labodunerd/blob/master/labo/frontend/interactive_svg/campement.min.foreign.svg

I precise, than the foreign object is encapsulated in a <switch> parent with an alternate <text> node when foreign objects are not supported.

I don't really understand the restriction on "http://www.w3.org/1999/xhtml" namespace except in the <iframe> context => treated on
https://github.com/wikimedia/mediawiki/blob/master/includes/upload/UploadBase.php#L1475-L1480 ?

The challenge with allowing XHTML is that none of us has written the code to strip out XSS attacks from arbitrary HTML. To quote the "Safely validating untrusted HTML input" section of enwiki's Cross-site scripting (XSS) article:

Many operators of particular web applications (e.g. forums and webmail) allow users to utilize a limited subset of HTML markup. When accepting HTML input from users (say, <b>very</b> large), output encoding (such as &amp;lt;b&amp;gt;very&amp;lt;/b&amp;gt; large) will not suffice since the user input needs to be rendered as HTML by the browser (so it shows as "very large", instead of "\<b\>very\</b\> large"). Stopping an XSS attack when accepting HTML input from users is much more complex in this situation. Untrusted HTML input must be run through an HTML sanitization engine to ensure that it does not contain XSS code.

In short, arbitrary XHTML is really scary to allow.

Your specific example (campement.min.foreign.svg) looks harmless enough, but it's obviously only using a tiny subset of what is possible with XHTML. The switch statement in there offers hope:

<switch>
    <foreignObject x="0" y="0" width="100%" height="100%">
      <section class="description" xmlns="http://www.w3.org/1999/xhtml">
        <article id="dame-brassempouy-description">
          <h2><span class="indice">1</span>Dame de Brassempouy</h2>
          <p>Cette illustration fait une allusion à la <strong>dame de Brassempouy</strong>.<br />
          Cette dernière est un fragment de statuette en ivoire datant du Paléolithique supérieur.<br />
          Sur sa tête, un quadrillage formé d’incisions perpendiculaires a été interprété, ici, par une chevelure tressée.<br />
          Son visage porte des stries ou scarifications, interprété dans cette illustration par un maquillage tribal.
          </p>
          <p>
            Référence : <a href="https://fr.wikipedia.org/wiki/Dame_de_Brassempouy">Page wikipédia</a>
          </p>
        </article>
    </section>
  </foreignObject>
  <text x="90" y="69" fill="#000000" text-anchor="middle" font-size="12px" font-family="Helvetica">[Not supported by viewer]</text>
</switch>

...but it looks like the alternate text note is "[Not supported by viewer]" rather than something marginally more useful (like an unformatted version of the text). Section 23.4 of the SVG 1.1 spec provides a good example of what the alternate version should look like. What tool did you use to create it?

I've generated this illustration with a custom tool.
This tool (a wysiwyg and an offline web app) can import a static <svg> (no smil, javascript and foreign objects) and add a legend with descriptions.

The goal of this app => Non informaticiens can edit descriptions on an interactive illustration. (concern actually an education project named AbulEdu http://www.abuledu.org/ but can be larger)

I've personnally added a <switch> node on generated process.
However, to edit descriptions (foreign objets product), i've used a wysiwyg editor (a lighter http://alex-d.github.io/Trumbowyg/).

I don't control the result of this editor. (just putting the result inside the <foreignobjet>)
To produce an alternate text, the best i can do is to filter html tags (like strip_tags() in php);

I'm certainly naive on XSS but i've made some homework about => accept only xhtml (no html soup) with a strong xml parser can filter (or detect) quickly all suspicious content like <script> or <iframe> tag, xml attritubes prefixe with "on". In short, delete all the javascript content represent 99% of XSS vulnerability? If not, a white and/or a backlist (<form>, <input> etc.) can tougher rules.

I suppose a lot of good PHP lib do this work well ?

Otherwise, why not proposing a intermediate langage like Markdown?
Markdown can be parse and transform into correct xhtml tags. (and some wysiwyg editor exist)

We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML in SVG file uploads. But it would definitely require some work.

I've generated this illustration with a custom tool.
This tool (a wysiwyg and an offline web app) can import a static <svg> (no smil, javascript and foreign objects) and add a legend with descriptions.[...] I've personnally added a <switch> node on generated process.
However, to edit descriptions (foreign objets product), i've used a wysiwyg editor (a lighter http://alex-d.github.io/Trumbowyg/).

Thanks for the info, that helps us understand tools actual in use versus theoretical possibilities.

I'm certainly naive on XSS but i've made some homework about => accept only xhtml (no html soup) with a strong xml parser can filter (or detect) quickly all suspicious content like <script> or <iframe> tag, xml attritubes prefixe with "on". In short, delete all the javascript content represent 99% of XSS vulnerability? If not, a white and/or a backlist (<form>, <input> etc.) can tougher rules.

My hunch is that we should pick the very minimal subset of HTML needed to support the basic use case (word wrapping and basic text formatting in captions). If we can find a very minimal set, we can build up. Starting small and allowing more over time is far easier than starting large but accidentally introducing problems that are hard to roll back.

To that end, what are the HTML markup features you know you really need versus features that are merely nice to have?

We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML in SVG file uploads. But it would definitely require some work.

That's a really good point. We've have a page describing what Sanitizer does: meta:Help:HTML_in_wikitext. The question that I have: would that allow for Wikitext embedded in SVG as well?

We should prefer that a Sanitizer.php-based solution doesn't allow Wikitext, since the potential for unintended misfeatures is a big problem. For example, we could accidentally create inefficient markup that creates unwanted backward compatibility liability if we tried to roll it back, or we could create markup that makes it difficult to interoperate with good editing tools.

No, we don't want to parse wikitext in SVG images, that's far too crazy :) But I think Sanitizer can just sanitize HTML, and we could prevent the upload if it's not acceptable. (I'll note that the current upload system doesn't allow us to modify the image while it's being uploaded, only accept/reject.)

No, we don't want to parse wikitext in SVG images, that's far too crazy :) But I think Sanitizer can just sanitize HTML, and we could prevent the upload if it's not acceptable.

That sounds great. It'd be good to double check that our HTML subset doesn't allow anything that the spec disallows.

(I'll note that the current upload system doesn't allow us to modify the image while it's being uploaded, only accept/reject.)

Right now, I think we're rejecting everything that has HTML in it, so this would be an improvement.

Given that we know that draw.io does this, it'd be nice if someone created a test suite of images that they want to work, but currently are rejected. I'm guessing that'd be a huge help to a developer who is ready to try hooking up a Sanitizer-class-based solution.

So some thoughts:

Currently the upload pipeline cannot edit files in upload, so unless we re-architect a bunch of scary upload code, the sensitization process is a binary pass/fail, we cannot modify the file to be safe. However the sanitizer does assume it could make minor changes to the html to make it safe. For example, it modifies whitespace around tags, changes named html entities to numeric ones, deletes inline style tags with dangerous css, etc. So in order for this plan to work, the html snippet in the svg would have to match exactly what the sanitizer expects to see for html - it would reject a bunch of perfectly fine cases where the html isn't an exact match for what the sanitizer wants.

The other thing that should be brought up, is our svg renderer (rsvg) does not support html foreign objects (AFAIK) so these things won't actually display on our site.

Generally though, I think in principle this could be viable if we're ok with those restrictions.

To bring a minimal subset of HTML, I suggest to allow this tags :
<div> <span> <strong> <em> <strike> <p> <br> <a>

For html attributes, i suggest exclusively "class" and "id".

and Optional subset of will be <table> <thead> <tbody> <tr> <th> <td> and html5 tags like <section> <article> <aside>.

I'm not taking about <img> and <a> tags.
Certainly, too difficult with data-type, svg file, uri like mailto:, tel: ...

Discussion has fallen off on this RFC task; is there interest in reviving it, or should we take it off the TechCom list until such time as it's reactivated?

Milimetric subscribed.

Making an executive decision here to untag TechCom, as Brion suggested last year. Please add us back if you need TechCom support or discussion.