Maniphest T204279

Fine-grained Sanitizer control
Open, HighPublic
Actions

Assigned To

None

Authored By

	cscott
	Sep 13 2018, 9:05 PM

Description

As discussed in https://gerrit.wikimedia.org/r/453057 (and in our parsing team offsite):

We should introduce two unspoofable attributes which can be added internally in extensions/Parsoid core/etc to:

skip a node and all its children when sanitizing
skip just the node (but sanitize the children)

By default, dom fragments are still sanitized, but:

The output of the sanitizer sets the "skip node and all children" on the top-level node it returns, so that repeated invocations of the sanitizer on this subtree are safe
Certain extensions will set the attributes as needed (maybe the pre extension, eg)

The goal should be to ensure that extension authors aren't given a footgun to bypass the sanitizer mechanism, but instead are given a finer-grained mechanism to do "only what they need". They'd use the "just ignore this node but not children" mechanism if they need to tunnel one specific node type through which is otherwise disallowed (<script> say), but we'd have the "big gun" mechanism to ignore a whole safe subtree if that subtree is known not to contain any user-generated content (or has been sanitized already).

The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.

Related Objects
Search...

Status	Assigned	Task
Open	None	T48826 Sanitizer breaks microdata
Open	None	T248211 One Sanitizer to Rule Them All
Open	None	T247804 Move Sanitizer from core into Parsoid
Open	None	T204279 Fine-grained Sanitizer control

Event Timeline

cscott created this task.Sep 13 2018, 9:05 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 13 2018, 9:05 PM

cscott updated the task description. (Show Details)Sep 13 2018, 9:09 PM

Reedy added projects: Security-Team, Parsing-Team--ARCHIVED.Sep 13 2018, 11:40 PM

Reedy edited projects, added acl*security; removed Security-Team.

Reedy removed subscribers: Security-Team, Parsing-Team--ARCHIVED.

The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.

I'm not sure I agree with this assessment. I think the problem lies in the parser TagHook interface which promotes outputting html, not the Sanitizer. Additionally perhaps a culture of skipping the sensitization process entirely when needed instead of skipping it on a fine grained basis.

I'm not sure I see the practical difference between this proposal, and having strip markers like we currently do. Arguably this proposal is more elegant in a way as it doesn't rely on substitution but nice DOM methods, however at the end of the day, the result seems pretty similar.

In T204279#4582535, @Bawolff wrote:

The current PHP sanitizer mechanism seems to encourage extension authors to emit HTML (rather than wikitext) if they need access to elements which would otherwise be sanitized, and the HTML-output mode bypasses the sanitizer completely. That increases the burden of security review, since now every part of that extension could be an unwitting vector for evil user-generated HTML. If instead the extension output is *always* sanitized, and there are more fine-grained mechanisms to tunnel specific "allowed" features through the sanitizer, we can undertake more focused security reviews of a smaller trusted code base.

I'm not sure I agree with this assessment. I think the problem lies in the parser TagHook interface which promotes outputting html, not the Sanitizer. Additionally perhaps a culture of skipping the sensitization process entirely when needed instead of skipping it on a fine grained basis.

I'm not sure I see the practical difference between this proposal, and having strip markers like we currently do. Arguably this proposal is more elegant in a way as it doesn't rely on substitution but nice DOM methods, however at the end of the day, the result seems pretty similar.

Sorry, I was thinking you were referring to the MediaWiki parser, not parsoid. My comment is not relevant to parsoid.

• ssastry moved this task from Needs Triage to Future Ideas on the Parsoid board.Sep 17 2018, 7:34 PM

• chasemp triaged this task as High priority.Dec 9 2019, 4:48 PM

• chasemp added a project: Security.Feb 10 2020, 10:54 PM

• chasemp removed a project: acl*security.Feb 20 2020, 8:05 PM

• ssastry removed a project: Parsing-Team--ARCHIVED.Mar 16 2020, 6:38 PM

cscott added a parent task: T247804: Move Sanitizer from core into Parsoid.Jan 25 2021, 2:21 PM

TK-999 subscribed.Jan 25 2021, 2:32 PM

MarkusRost subscribed.Mar 16 2021, 7:33 PM