Page MenuHomePhabricator

Make Parsoid and PHP edit-section numbering consistent when <noinclude> and <includeonly> are in use
Open, MediumPublic

Description

In the comments to https://gerrit.wikimedia.org/r/483490 (for T213468) @Arlolra found a possible corner case where Parsoid and PHP might diverge in edit-section numbering (the numeric parameter found in the "edit this section" link): we process <includeonly> and <noinclude> in a subpipeline. This shouldn't matter for most pages, since we section numbering is suppressed inside included templates anyway, but if you were directly viewing a page in the Template namespace which included these elements -- and that page included section headers which are otherwise balanced and Parsoid-approved -- it is possible Parsoid and PHP might diverge. That would be bad, because you'd get the wrong section to edit and (in the worse case) your edited text could get reinserted in the wrong section of the page.

Filing this task to investigate as to whether there is an actual issue here, and fix it if so.

Event Timeline

Confirmed there's a bug here. Given the wikitext:

==PHP section=1==
<noinclude>
==PHP section=2==
</noinclude>
==PHP section=3==
<includeonly>
==This is not counted as section 4==
</includeonly>
==PHP section=4==

Parsoid generates:

<section data-mw-section-id="0"></section><section data-mw-section-id="1">
<h2 id="PHP_section=1"><span id="PHP_section.3D1" typeof="mw:FallbackId"></span>PHP section=1</h2>
<meta typeof="mw:Includes/NoInclude"/> </section><section data-mw-section-id="1">
<h2 id="PHP_section=2"><span id="PHP_section.3D2" typeof="mw:FallbackId"></span>PHP section=2</h2>
<meta typeof="mw:Includes/NoInclude/End"/> </section><section data-mw-section-id="2">
<h2 id="PHP_section=3"><span id="PHP_section.3D3" typeof="mw:FallbackId"></span>PHP section=3</h2>
<meta typeof="mw:Includes/IncludeOnly" data-mw='{"src":"&lt;includeonly>\n==This is not counted as section 4==\n&lt;/includeonly>"}'/><meta typeof="mw:Includes/IncludeOnly/End"/> </section><section data-mw-section-id="3">
<h2 id="PHP_section=4"><span id="PHP_section.3D4" typeof="mw:FallbackId"></span>PHP section=4</h2>
</section>

Note that:

  1. We generate separate IncludeOnly and IncludeOnly/End <meta> tags, so there's no proper nesting and section balancing is bypassed.
  2. The IncludeOnly section is indeed parsed in a subpipeline, but it appears to start by creating a duplicate section 1. This throws off the section numbering in the remainder.

Change 489306 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Add test case for T215628: section numbering and <includeonly>

https://gerrit.wikimedia.org/r/489306

Change 489306 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add test case for T215628: section numbering and <includeonly>

https://gerrit.wikimedia.org/r/489306

Dinoguy1000 renamed this task from Make Parsoid and PHP edit-section numbering consistent when <noinclude> and <include> are in use to Make Parsoid and PHP edit-section numbering consistent when <noinclude> and <includeonly> are in use.Feb 9 2019, 2:54 AM
Dinoguy1000 updated the task description. (Show Details)
Dinoguy1000 added a subscriber: Dinoguy1000.

Is <onlyinclude> also affected by this?

ssastry triaged this task as Medium priority.Apr 22 2019, 1:24 PM
ssastry edited projects, added Parsoid-Read-Views-Deprecated-Project; removed Parsoid.

Change 577368 had a related patch set uploaded (by Sbailey; owner: Sbailey):
[mediawiki/services/parsoid@master] Fix for duplicated and out of sequence data-mw-section-id values

https://gerrit.wikimedia.org/r/577368

Sbailey added a subscriber: Sbailey.

As advised by Subbu, there are two possible (probably correct) solutions to this bug beyond the partial fix that is here, but in fact is not correct for all circumstances as his note describes. Either a DOM based solution that walks the DOM and fixes the nested section ID's as a post process, or by taking advantage of the fact we no longer have asyncronous execution and having the $headingIndex, as set in line 54 of Grammar.php be set to an ongoing globally scoped state that increments appropriately and avoid ID's being reset to zero when new pipelines are created.

A paused Gerrit patch for this exists that fixes one test but does not fix the test mentioned in comments in:
https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/577368

Change 577368 abandoned by Sbailey:
[mediawiki/services/parsoid@master] Fix for duplicated data-mw-section-id values

Reason:
Does not correctly resolve this bug

https://gerrit.wikimedia.org/r/577368

Change 670843 had a related patch set uploaded (by BrandonXLF; owner: BrandonXLF):
[mediawiki/core@master] Use consistent section numbers

https://gerrit.wikimedia.org/r/670843

Test wiki created on Patch Demo by BrandonXLF using patch(es) linked to this task:

https://patchdemo.wmflabs.org/wikis/b0223bb0d7/w/

Test wiki on Patch Demo by BrandonXLF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/b0223bb0d7/w/