Page MenuHomePhabricator

More Parsoid TOC edge cases?
Closed, ResolvedPublic

Description

T352467 was resolved and that was supposed to have addressed TOC edge case bugs in Parsoid.

But this page has a TOC in the Parsoid version whereas the legacy version doesn't.

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
Resolvedcscott
OpenNone
OpenNone
OpenBUG REPORTNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedJgiannelos
OpenJgiannelos
OpenJgiannelos
OpenJgiannelos
OpenJgiannelos

Event Timeline

From a quick look I think that for some reason parsoid doesn't enforce the threshold to decide whether TOC should be shown or not (only 2 heading shouldn't be shown).

The root cause is here:
https://gerrit.wikimedia.org/g/mediawiki/core/+/3bd70fd2a19de62673e2010900d452ee7dc7e958/includes/skins/components/SkinComponentTableOfContents.php#77

Regardless of the various flags (toc, notoc, forcetoc) that parsoid is setting as expected, the skin eventually asks for the TOCData.
Legacy returns no TOCData when TOC shouldn't be rendered where Parsoid returns TOCData always.

In the line above:

Legacy:

$tocData = null

Parsoid:

$tocData = <non empty instance of Wikimedia\Parsoid\Core\TOCData>

More specifically when there are not enough matches for headlines, legacy parser returns early and doesn't set the TOCData here:
https://gerrit.wikimedia.org/g/mediawiki/core/+/3bd70fd2a19de62673e2010900d452ee7dc7e958/includes/parser/Parser.php#4307

Where parsoid always sets the tocdata here:
https://gerrit.wikimedia.org/g/mediawiki/core/+/3bd70fd2a19de62673e2010900d452ee7dc7e958/includes/Output/OutputPage.php#2437

That happens because when ParserOutput is initialized:

  • On legacy
    • jsonData["TOCData"] is null
  • On parsoid
    • jsonData["TOCData"] is populated

Change #1191423 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/parsoid@master] Force no-toc when not enough data to populate it

https://gerrit.wikimedia.org/r/1191423

The root cause is here:
https://gerrit.wikimedia.org/g/mediawiki/core/+/3bd70fd2a19de62673e2010900d452ee7dc7e958/includes/skins/components/SkinComponentTableOfContents.php#77

Regardless of the various flags (toc, notoc, forcetoc) that parsoid is setting as expected, the skin eventually asks for the TOCData.
Legacy returns no TOCData when TOC shouldn't be rendered where Parsoid returns TOCData always.

In the line above:

Legacy:

$tocData = null

Parsoid:

$tocData = <non empty instance of Wikimedia\Parsoid\Core\TOCData>

Interesting -- is this all skins? or just a specific skin? Should we consider a skin bug? That apart, I am happy with just fixing parsoid to be compatible (hyrum's law ftw) as you've already done.

Its all skins (i think) assuming that skins wont put extra logic other what the skin component does.

While debugging this with @cscott it turns out that one of the problems is that there is a typo in the wikitext of the page on the initial report:

image.png (352×3 px, 123 KB)

emphasis on the non closing h1.

Change #1191423 abandoned by Jgiannelos:

[mediawiki/services/parsoid@master] Override TOCData object when not enough data to populate it

Reason:

Turns out this fix is not needed. This patch is related: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1190671

https://gerrit.wikimedia.org/r/1191423

Closing this one after all the subtickets filed