Page MenuHomePhabricator

Extension API: strip state issues
Open, HighPublic

Description

We need to figure out how Parsoid interacts with the concept of 'strip state' from the legacy parser.

The reason is that many existing extensions explicitly interact with strip state, for various reasons:

  • Protecting rawHTML content from the sanitizer
  • As an ad-hoc escape mechanism for template arguments ({{Foo|bar=<nowiki>something with |</nowiki>}} and then fetch the raw contents of the bar argument from the strip state)
  • Explicitly as part of the Scribunto API
  • other reasons yet to be discovered

There may not be one solution here, but as we port new extensions to Parsoid we need to come up with guidance for how the various uses of strip state get translated into the Parsoid extension API. This may (or may not) include adding an explicit "strip state" type API mechanism, or longer-term solutions (such as heredoc arguments instead of the ad-hoc <nowiki> argument escapes). There may be low-hanging fruit for certain uses (like tunneling content past the sanitizer) that can be forked off this task.

Event Timeline

ssastry triaged this task as Medium priority.Jul 15 2020, 5:49 PM
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.
ssastry moved this task from Bugs & Crashers to Missing Functionality on the Parsoid board.
ssastry added a project: Parsoid-Rendering.

I filed T270127 as well. but we should probably figure out what strategy we want to adopt here. Narrowly support strip state solutions as in this task or temporarily support some form of shared parser object as in T270127 with an understanding that we will probably want some solution of the nature outlined here.

We can probably have something that vaguely looks like a strip state to handle "content which can't be represented as a string".

That is, when we invoke a legacy parser function which expects only strings as arguments, we take our token sequence, and anything in our token sequence which isn't "actually" a string gets turned into strip state. The parser function will eventually return a string, and we comb through that string for strip state markers and turn them back into the original tokens.

And thing for extensions which want to tunnel specific things to the output. They can add strip markers and items to the strip state, and we'll turn them into DOMFragments (our version of a "tunnel").

MSantos raised the priority of this task from Medium to High.Tue, Sep 19, 3:54 PM