Page MenuHomePhabricator

Parsoid is not adding headings to TOC entries in some templated content scenarios
Closed, ResolvedPublic

Description

See https://de.wikipedia.org/wiki/William_Peter_Blatty vs. https://de.wikipedia.org/wiki/William_Peter_Blatty?useparsoid=1. The three headings in the TOC missing in Parsoid are all template generated and they are all wrapped in a <div> tag. This is some edge case.

Event Timeline

ssastry renamed this task from Parsoid should emit TOC entries for non-editable sections from templates to Parsoid is not adding headings to TOC entries in some templated content scenarios..Mar 6 2024, 8:29 PM
ssastry updated the task description. (Show Details)
ssastry renamed this task from Parsoid is not adding headings to TOC entries in some templated content scenarios. to Parsoid is not adding headings to TOC entries in some templated content scenarios.Mar 6 2024, 8:39 PM
ssastry claimed this task.
ssastry triaged this task as Medium priority.

This is actually a near-neighbor of T214241: data-mw info is clobbered by template annotations. The section-wrapping / TOC code isn't able to demarcate the boundary of an extension because the extension output happens to be the first element of a template and so the extension-content boundary is not demarcated. And, so the section wrapping code treats the entire template wrapped DOM forest to be extension content and suppresses headings from it (as required by T355092: Parsoid / legacy parser disagree whether to include extension content in TOC).

T275082: Develop a spec for representing a DOM range in serialized Parsoid output is one solution that would address this problem, but we aren't in a place right now to do an overhaul of our spec. We need a solution that is an incremental enhancement and revisit our spec when we have mechanisms in place to do major version bumps of our HTML spec (including content negotiation + a mechanism to reach clients that may need to update to new versions of a schema).

But, there are other ideas in T295171: Use data-mw.rangeId="t:...." instead of "about" for template ranges and T214241: data-mw info is clobbered by template annotations that we could explore.

We may not need to block on any of the tasks above actually. It is sufficient to simply check (while processing a heading) if it is nested inside an extension (which is a cheap and robust check and doesn't depends on having explicit nested boundaries).

Change 1009632 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] WIP: Fix handling of headings in extensions wrt TOC data

https://gerrit.wikimedia.org/r/1009632

Change 1009839 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] Fix handling of headings in extensions wrt TOC data

https://gerrit.wikimedia.org/r/1009839

Change 1009839 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Fix handling of headings in extensions wrt TOC data

https://gerrit.wikimedia.org/r/1009839

Change 1010238 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a22

https://gerrit.wikimedia.org/r/1010238

Change 1010238 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a22

https://gerrit.wikimedia.org/r/1010238