Page MenuHomePhabricator

Systematic sanitization for SVGs and HTML
Open, MediumPublic

Description

Several backend services want to expose HTML and SVG content without any wrapping. To avoid XSS even on older browsers without support for CSP & related policies, we should make sure that all such content is properly sanitized.

Parsoid already has a port of the MediaWiki HTML sanitizer. This is currently operating on tokens, but can be ported to work as a SAX handler as well. Much of the same attribute sanitization is needed for SVGs. There is code in MediaWiki that validates (but not sanitizes) SVGs in the upload path, which we should template this on. That code also uses the PHP Sanitizer class.

It might make most sense to offer this as html2html and svg2svg end points in Parsoid. This lets Parsoid combine sanitization with format migrations. For SVG, we could also consider offering minimization (T74547). There is also a proposal for T78579: SVG to PNG conversion, minimization, sanitization service, although integrating that into Parsoid would be a bit of a stretch.

Another option would be to handle SVG separately in a SVG / PNG conversion service. This would require the extraction of the Parsoid sanitizer as a library, which would be useful in its own right.

In RESTBase, sanitization should be tied to the expected content-type (as documented in the swagger spec). The Sanitizer should emit a content-type like text/html;profile=mediawiki.org/specs/html/1.0.1, which is stored along with content in RESTBase. If the service-provided or stored content-type does not match the expectation in the swagger spec, the content is sent to the sanitizer registered for the base content type (possibly by prefix or regexp) and only returned / stored back if the result then matches the expectation.

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added projects: RESTBase, Parsoid.
GWicke subscribed.
Arlolra triaged this task as Medium priority.Jul 7 2015, 3:59 AM
Arlolra moved this task from Needs Triage to Needs Discussion on the Parsoid board.
Arlolra subscribed.

For php, there's http://htmlpurifier.org/. Mario Heiderich (Cure53/html5sec.org) also gave me a copy of his svgpurifier library, which does the same for svgs.

Is there a similar node library? If not, porting both of those libraries would be fairly straight forward.

Is there a spec for htmlpurifier/svgpurifier? My personal goal here is to move away from adhoc sanitization and have a written spec of exactly what we're doing to the input.

The type of HTML sanitization we are after is fairly specific to MediaWiki, so it might make sense to actually use the Parsoid sanitizer.js, hooked up to SAX handlers or a DOM visitor. This also avoids maintaining two separate sanitizers.

The main issue with the Parsoid sanitizer is that it currently does not cover SVG. We could either add that, or use a separate library for SVG.

From the looks of it, DOMPurify should cover our SVG use case, and unlike ammonia, is written in JS, making it possible to adapt it into Parsoid sanitizer. Lets security evaluate it - this way I can already embed it into Graphoid, until restbase starts supporting it transparently.