Page MenuHomePhabricator

Resolve diferences between Parsoid & legacy parser TOC metadata output for template, extension, and parser-function generated content
Open, HighPublic

Description

See the tests/parser/toc.txt file in Parsoid which specs out TOC metadata for wikitext.

As the file documents in comments for various tests, when the content comes from templates (or parser functions or extensions), Parsoid's output and legacy parser output differ. We should reconcile these differences depending on expectations for this metadata by other products / code.

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
Resolvedcscott
OpenNone
Resolvedcscott
OpenNone
OpenNone
OpenNone
Resolved ssastry

Event Timeline

See also T213468: Parsoid section IDs don't correspond to PHP section IDs when headings are transcluded, T215628: Make Parsoid and PHP edit-section numbering consistent when <noinclude> and <includeonly> are in use, and the longer discussion in T269630: Parsoid should support section editing links.

Extension and parser function related output probably doesn't matter, but template-generated content generates section edit links directly to the appropriate section of the transcluded page, and it's important to get that right. In the worst case, a differing section index could cause corruption during save (the wrong section replaced) although the more likely error is that clicking on the section edit link will start editing the "wrong" section.

Finally, see https://gerrit.wikimedia.org/r/c/wikipeg/+/508037 and the related T222419#5216719.

Change 896369 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] TOCData computation: Eliminate some egregious diffs with legacy parser

https://gerrit.wikimedia.org/r/896369

Change 896369 merged by jenkins-bot:

[mediawiki/services/parsoid@master] TOCData Computation: Eliminate some egregious diffs with legacy parser

https://gerrit.wikimedia.org/r/896369

Change 896413 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a20 and zest-css to 3.0.0

https://gerrit.wikimedia.org/r/896413

Change 896413 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.17.0-a20 and zest-css to 3.0.0

https://gerrit.wikimedia.org/r/896413

@cscott and @ssastry I see a patch for this task, should this be on WIP board?

MSantos triaged this task as High priority.Sep 19 2023, 3:59 PM

This is high priority so someone can verify and created the needed sub-tasks and/or close the resolved ones.

Change 1010306 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Strip entity spans from toc lines

https://gerrit.wikimedia.org/r/1010306

Change 1010306 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Strip entity spans from toc lines

https://gerrit.wikimedia.org/r/1010306

Change 1011398 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a23

https://gerrit.wikimedia.org/r/1011398

Change 1011398 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a23

https://gerrit.wikimedia.org/r/1011398