Page MenuHomePhabricator

Wrong section numbering if Parsoid is used and wikitext is invalid
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

  • The sections are numbered in a different:
  • Parsoid rendering: section = 19
  • non-Parsoid rendering: section = 4
  • After pressing on "edit source" in Parsoid-renderer mode the error message "Cannot find section" occurs. With non-Parsoid renderer all is OK.

What should have happened instead?:

  • Also the parsoid renderer should use the correct section numbers.

Note: This only happened with the [[Portland (Oregon)|Portland],] typo before https://en.wikivoyage.org/w/index.php?title=Emeryville&diff=4922006&oldid=4921380

Event Timeline

The severity is to set "High" because editing articles may fail.

Aklapper raised the priority of this task from High to Needs Triage.Aug 13 2024, 10:39 AM

@RolandUnger Did cscott agree to work on this?

Aklapper renamed this task from Wrong section numbering if Parsoid is used to Wrong section numbering if Parsoid is used and wikitext is invalid.Aug 13 2024, 10:41 AM
Aklapper updated the task description. (Show Details)

cscott is working on this project Parsoid, and I hope he could help. At least, he should be a subscriber.

Aklapper removed cscott as the assignee of this task.EditedAug 13 2024, 10:45 AM
Aklapper added a subscriber: cscott.

Please don't assign folks without their consent as it's up to every individual what they (don't) plan to work on. Thanks!

Adding Content-Tranform-Team to the set of tags (or, indeed, Parsoid and/or Parsoid-Read-Views) is sufficient to ensure our team sees it and triages it to someone who can work on it.

For context during triage: https://www.mediawiki.org/wiki/Parsing/Notes/Section_Wrapping is how parsoid section numbering should work. There are corner cases where parsoid will disagree with the legacy parser wrt section boundaries, but parsoid *should* use section-id=-1 for these cases.

I've looked into that yesterday (and put my notes in T222419#10058673); I am not *convinced* that it requires the wikitext to be invalid for it to trigger. The PEG parser might backtrack on valid-wikitext-that's-just-ambiguous-enough, without the wikitext itself being entirely at fault. I think.

Just recording for debugging purposes since this is not reproducible on the current version of the page, if you 'view source' on https://en.wikivoyage.org/w/index.php?title=Emeryville&oldid=4921380 , you can see the bad section ids on the section wrapper tags.

This snippet below is sufficient to reproduce the id-assignment issue:

=S1=
[[Foo|Bar],]

=S2=
x

See Parsoid's output below -- S2 gets id 3 instead of 2.

$ php bin/parse.php --wrapSections < /tmp/wt
<section data-mw-section-id="0" data-parsoid="{}"></section><section data-mw-section-id="1" data-parsoid="{}"><h1 id="S1" data-parsoid='{"dsr":[0,4,1,1]}'>S1</h1>
<p data-parsoid='{"dsr":[5,17,0,0]}'>[[Foo|Bar],]</p>

</section><section data-mw-section-id="3" data-parsoid="{}"><h1 id="S2" data-parsoid='{"dsr":[19,23,1,1]}'>S2</h1>
<p data-parsoid='{"dsr":[24,25,0,0]}'>x</p>
</section>

Triaging this during April 2025 Essential Week to "Next" (putting on CTT board) because it's been reported several times and it feels warranting a fix to be able to consistently edit sections, even if the wikitext input is making the PEG parser backtrack.

Change #507966 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Use wikipeg rule variable to ensure headingIndex is correct after backtrack

https://gerrit.wikimedia.org/r/507966

Change #507966 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Use wikipeg rule variable to ensure headingIndex is correct after backtrack

https://gerrit.wikimedia.org/r/507966

Change #1187841 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.22.0-a21

https://gerrit.wikimedia.org/r/1187841

Change #1187841 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.22.0-a21

https://gerrit.wikimedia.org/r/1187841