Page MenuHomePhabricator

Audit Parsoid's handling of wikitext and html strings in HTML attributes
Open, MediumPublic

Description

Parsoid output:

[subbu@earth parsoid] cat /tmp/wt
<div title="[[Foo]]">foo</div>
<div title="{{1x|Foo}}">foo</div>
<div title="{{NonexistingPage}}">foo</div>
<div title="<i>HTML tags</i>">foo</div>
<div title="''HTML tags''">foo</div>

[subbu@earth parsoid] parse.js --normalize < /tmp/wt
<div title="[[Foo]]">foo</div>
<div title="Foo">foo</div>
<div title="Template:NonexistingPage">foo</div>
<div title="&lt;i">HTML tags">foo</div>
<div title="''HTML tags''">foo</div>

PHP parser output

[subbu@earth maintenance] php parse.php < /tmp/wt
...
<div title="&#91;&#91;Foo]]">foo</div>
<div title="&#91;&#91;:Template:1x]]">foo</div>
<div title="&#91;&#91;:Template:NonexistingPage]]">foo</div>
&lt;div title="<i>HTML tags</i>"&gt;foo</div>
<div title="&#39;&#39;HTML tags&#39;&#39;">foo</div>
...

Need to figure out which of this is broken behavior in Parsoid, which in PHP parser, and which of it is undefined behavior.

Event Timeline