Page MenuHomePhabricator

Audit Parsoid's handling of wikitext and html strings in HTML attributes
Open, MediumPublic

Description

Parsoid output:

[subbu@earth parsoid] cat /tmp/wt
<div title="[[Foo]]">foo</div>
<div title="{{1x|Foo}}">foo</div>
<div title="{{NonexistingPage}}">foo</div>
<div title="<i>HTML tags</i>">foo</div>
<div title="''HTML tags''">foo</div>

[subbu@earth parsoid] parse.js --normalize < /tmp/wt
<div title="[[Foo]]">foo</div>
<div title="Foo">foo</div>
<div title="Template:NonexistingPage">foo</div>
<div title="&lt;i">HTML tags">foo</div>
<div title="''HTML tags''">foo</div>

PHP parser output

[subbu@earth maintenance] php parse.php < /tmp/wt
...
<div title="&#91;&#91;Foo]]">foo</div>
<div title="&#91;&#91;:Template:1x]]">foo</div>
<div title="&#91;&#91;:Template:NonexistingPage]]">foo</div>
&lt;div title="<i>HTML tags</i>"&gt;foo</div>
<div title="&#39;&#39;HTML tags&#39;&#39;">foo</div>
...

Need to figure out which of this is broken behavior in Parsoid, which in PHP parser, and which of it is undefined behavior.

Event Timeline

ssastry triaged this task as Medium priority.Nov 21 2017, 9:14 PM