Page MenuHomePhabricator

MediaWiki shouldn't assign section numbers during tokenization, but instead only when headings are generated
Closed, ResolvedPublic

Description

Consider this wikitext:

==A==
{{#if:
==B==
}}
==C==

The PHP parser will emit:

<h2>...<a href="/~cananian/mediawiki/index.php?title=CLIParser&amp;action=edit&amp;section=1" title="Edit section: A">edit source</a>...</h2>
<h2>...<a href="/~cananian/mediawiki/index.php?title=CLIParser&amp;action=edit&amp;section=3" title="Edit section: C">edit source</a>...</h2>

Note that the section number for C is 3, not 2. This is because the preprocessor assigns section numbers as soon as it sees the heading tokens ==...==, irrespective of whether it is inside a template argument or not.

The preprocessor should instead look at its template nesting tree to determine whether it is in a template argument before assigning section numbers.

Parsoid was updated to match PHP's behavior in T213468; when this bug is fixed in core Parsoid should be re-simplified to match.

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
Resolvedcscott
Resolvedcscott
OpenNone
OpenNone
OpenBUG REPORTNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedcscott

Event Timeline

I think the index could be assigned when "possible-h" nodes are promoted to "h" nodes, around line 785 of Preprocessor_Hash.php.

Change 730755 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Preprocessor: Don't assign a heading index to a possible-h node

https://gerrit.wikimedia.org/r/730755

Change 730755 abandoned by Tim Starling:

[mediawiki/core@master] Preprocessor: Don't assign a heading index to a possible-h node

Reason:

The existing section numbering algorithm is a reasonable compromise between Parsoid's needs and MediaWiki's needs

https://gerrit.wikimedia.org/r/730755

Given that Tim abandoned his attempt and given this is an edge case (see discussion on gerrit patch), I am inclined to decline this since we aren't going to go tweak this now -- we just need to make sure Parsoid's ids match the existing ids for b/c reasons.

matmarex renamed this task from MediaWiki shouldn't assign section ids during tokenization, but instead only when headings are generated to MediaWiki shouldn't assign section numbers during tokenization, but instead only when headings are generated.Dec 16 2024, 10:21 PM
matmarex updated the task description. (Show Details)

Change #1132793 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Match legacy section numbering with sol transparent on line

https://gerrit.wikimedia.org/r/1132793

Change #1132793 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Match legacy section numbering with sol transparent on line

https://gerrit.wikimedia.org/r/1132793

Change #1134218 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a24

https://gerrit.wikimedia.org/r/1134218

Change #1134218 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a24

https://gerrit.wikimedia.org/r/1134218