Page MenuHomePhabricator

<small> leaks out of <li>
Closed, ResolvedPublic

Description

(See https://en.wikipedia.org/wiki/User:Cscott/Sandbox#Leaky_LI).

Parser test cases:

!! test
Leaky <li> (1)
!! options
parsoid=wt2html
!! wikitext
<ol>
<li>a<small>b</li>
<li>c</li>
</ol>
!! html/php+tidy
<ol>
<li>a<small>b&lt;/li&gt;</small></li>
<li><small>c</small>
<p><small>&lt;/ol&gt;</small></p>
</li>
</ol>
!! html/parsoid
<ol>
<li>a<small>b</small></li>
<small>
<li>c</li>
</small></ol>
!! end

Not sure we want to emulate the &lt;/li&gt; part of the above output, but it does seem odd that the <small> tag "escapes" and surrounds the <li>, rather than surrounding the *content* of the <li>.

Similarly:

!! test
Leaky <li> (2)
!! options
parsoid=wt2html
!! wikitext

Leaky LI

<li>A
<li>B <small> C
<li>D

Next Heading

x
!! html/php+tidy
<h2><span class="mw-headline" id="Leaky_LI">Leaky LI</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=1" title="Edit section: Leaky LI">edit</a><span class="mw-editsection-bracket">]</span></span></h2>
<ul>
<li>A</li>
<li>B <small>C</small></li>
<li><small>D</small></li>
</ul>
<h2><small><span class="mw-headline" id="Next_Heading">Next Heading</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=2" title="Edit section: Next Heading">edit</a><span class="mw-editsection-bracket">]</span></span></small></h2>
<p>x</p>
!! html/parsoid
<h2>Leaky LI</h2>
<li>A</li>
<li>B <small> C</small></li>
<li><small>D
<h2>Next Heading</h2>
<p>x</p>
</small></li>
!! end

Here the <small> isn't around the <li> (that's an improvement, although a mysterious one). But the trailing <li> has swallowed up the <h2>, which surely isn't right.

May be related to bug 71185.


Version: unspecified
Severity: normal

Details

Reference
bz71473

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
Resolvedssastry

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:51 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz71473.

Similarly:

( echo '<h2>bla<h2>blub'; echo 'text' ) | tests/parse.js --normalize

<h2>bla</h2>
<h2>blub
<p>text</p>
</h2>

The PHP parser cleans this up with tidy, giving:

<h2><span class="mw-headline" id="bla.3Ch2.3Eblub.0Atext">bla&lt;h2&gt;blub text</span></h2>

...which isn't pretty, but at least it doesn't try to jam a <p> into the <h2>.

gerritadmin wrote:

Change 165749 had a related patch set uploaded by Cscott:
WIP: Document differences in HTML fixup between tidy and Parsoid.

https://gerrit.wikimedia.org/r/165749

Arlolra triaged this task as Medium priority.Nov 25 2014, 11:20 PM
Arlolra subscribed.
ssastry claimed this task.

With Tidy -> Remex transition, the behavior of Parsoid and core parser is identical. And, the leakiness is known difference of misnesting and we are using linter categories to flag them for fixup. Nothing to fix here.