Page MenuHomePhabricator

Table of contents HTML may be unbalanced
Closed, ResolvedPublic

Description

If a heading on a page contains italic text but doesn't close the '' or <i> syntax, then two things happen:

  1. The rest of the page after that heading is italic (this is more or less expected).

It is eventually closed at the end of the user-generated content block, via Tidy/Remex.

  1. The rest of the page before that heading is italic as well (unexpected).

This is because it seems the table of contents component doesn't balance itself.

Example at https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=3134998

Heading 5 ("Safe") has unclosed italics.

wikitext
== ''Safe ==
As everyone eventually finds...
head.html
<div id="toc" class="toc">
...
<li class="toclevel-1 tocsection-5"><a href="#Safe"><span class="tocnumber">5</span> <span class="toctext"><i>Safe</i></span></a></li><i>
<li class="toclevel-1 tocsection-6"><a href="#PartialTypeSignatures"><span class="tocnumber">6</span> <span class="toctext">PartialTypeSignatures</span></a></li>
</i></ul><i>
</i></div><i>
</i><i><h2><span class="mw-headline" id="The_Benign">The Benign</span> ..
</h2></i><i><p>It's not obvious which extensions are the most common but it's fairly safe to say that these extensions are benign and are safely used extensively:
</p></i><i><ul>
<li>OverloadedStrings</li>
<li>FlexibleContexts</li>
<li>FlexibleInstances</li>
<li>GeneralizedNewtypeDeriving</li>
<li>TypeSynonymInstances</li>
<li>MultiParamTypeClasses</li>
...
</ul>
..
</i>

Screenshot 2019-03-14 at 17.57.02.png (1×1 px, 177 KB)

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
Resolvedssastry
OpenNone
OpenNone
OpenNone
Resolvedovasileva
Resolvedssastry
OpenNone
Resolvedcscott
OpenNone
OpenNone
Resolvedmatmarex

Event Timeline

This is flagged for editors via the https://www.mediawiki.org/wiki/Help:Extension:Linter/unclosed-quotes-in-heading category.

But, we could potentially fix the TOC issue by treating it as a DOM fragment.

Bold syntax (''' or <b>), when used in headers, can also appear in the TOC; does this also happen if it is unclosed? If so, I wouldn't be surprised if this is a general problem with any markup that gets preserved in the TOC, but I have no idea if there's a "canonical" list of such markup (I think superscript/subscript are also preserved, and I wouldn't be surprised if certain semantic elements such as <em> and <strong> are as well).

@Dinoguy1000 Yes, I expect this to affect other markup allowed in TOC as well.

@ssastry Thanks, I wasn't aware of that Linter category.

I think it'd be useful indeed to embed the TOC as its own balanced subtree. That way the page can't change appearance based on where __TOC__ is located. It would also mean that it makes the overall skin system a bit easier to reason about from other read-modes where the table of contents doesn't exist, such as VisualEditor, or MobileFrontend, and mobile apps.

A user seeing the whole page in italic will likely try to fix that by editing it and looking for it near the top. However, the would not find it in this case because the TOC was embedded before applying Tidy/Remex.

Change 975064 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Use DOM to clean up headings for the table of contents (TOC)

https://gerrit.wikimedia.org/r/975064

(this issue is already fixed in Parsoid)

Change 975064 merged by jenkins-bot:

[mediawiki/core@master] Use DOM to clean up headings for the table of contents (TOC)

https://gerrit.wikimedia.org/r/975064