As discussed in https://gerrit.wikimedia.org/r/453057 (and in our parsing team offsite):
We should introduce two unspoofable attributes which can be added internally in extensions/Parsoid core/etc to:
- skip a node and all its children when sanitizing
- skip just the node (but sanitize the children)
By default, dom fragments are still sanitized, but:
- The output of the sanitizer sets the "skip node and all children" on the top-level node it returns, so that repeated invocations of the sanitizer on this subtree are safe
- Certain extensions will set the attributes as needed (maybe the pre extension, eg)
The goal should be to ensure that extension authors aren't given a footgun to bypass the sanitizer mechanism, but instead are given a finer-grained mechanism to do "only what they need". They'd use the "just ignore this node but not children" mechanism if they need to tunnel one specific node type through which is otherwise disallowed (<script> say), but we'd have the "big gun" mechanism to ignore a whole safe subtree if that subtree is known not to contain any user-generated content (or has been sanitized already).
The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.