Page MenuHomePhabricator

Fine-grained Sanitizer control
Open, HighPublic

Description

As discussed in https://gerrit.wikimedia.org/r/453057 (and in our parsing team offsite):

We should introduce two unspoofable attributes which can be added internally in extensions/Parsoid core/etc to:

  • skip a node and all its children when sanitizing
  • skip just the node (but sanitize the children)

By default, dom fragments are still sanitized, but:

  • The output of the sanitizer sets the "skip node and all children" on the top-level node it returns, so that repeated invocations of the sanitizer on this subtree are safe
  • Certain extensions will set the attributes as needed (maybe the pre extension, eg)

The goal should be to ensure that extension authors aren't given a footgun to bypass the sanitizer mechanism, but instead are given a finer-grained mechanism to do "only what they need". They'd use the "just ignore this node but not children" mechanism if they need to tunnel one specific node type through which is otherwise disallowed (<script> say), but we'd have the "big gun" mechanism to ignore a whole safe subtree if that subtree is known not to contain any user-generated content (or has been sanitized already).

The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.

I'm not sure I agree with this assessment. I think the problem lies in the parser TagHook interface which promotes outputting html, not the Sanitizer. Additionally perhaps a culture of skipping the sensitization process entirely when needed instead of skipping it on a fine grained basis.

I'm not sure I see the practical difference between this proposal, and having strip markers like we currently do. Arguably this proposal is more elegant in a way as it doesn't rely on substitution but nice DOM methods, however at the end of the day, the result seems pretty similar.

The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.

I'm not sure I agree with this assessment. I think the problem lies in the parser TagHook interface which promotes outputting html, not the Sanitizer. Additionally perhaps a culture of skipping the sensitization process entirely when needed instead of skipping it on a fine grained basis.

I'm not sure I see the practical difference between this proposal, and having strip markers like we currently do. Arguably this proposal is more elegant in a way as it doesn't rely on substitution but nice DOM methods, however at the end of the day, the result seems pretty similar.

Sorry, I was thinking you were referring to the MediaWiki parser, not parsoid. My comment is not relevant to parsoid.