Page MenuHomePhabricator

MobileFrontend + Parsoid has different collapsible-section HTML on en.wp main page
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What should have happened instead?:
Mobile + Parsoid should match the other renderings

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
I put this together the way I did because I wanted to leave the comma out of the heading. I'm slightly puzzled that MF+Parsoid is ending up with an extra div there:

image.png (78×608 px, 14 KB)

That is present in none of legacy and desktop Parsoid renderings, example from desktop Parsoid:

image.png (69×590 px, 10 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1227820 had a related patch set uploaded (by Rehan_khan_78; author: Rehan_khan_78):

[mediawiki/extensions/MobileFrontend@master] MobileFrontend: keep punctuation inline in headings for Parsoid

https://gerrit.wikimedia.org/r/1227820

Jdlrobson-WMF subscribed.

The issue is in the call to $parserOptions->setCollapsibleSections();. That seems to incorrectly wrap the comma in a DIV. The logic should be modified to only apply to headings that are direct descendents of .mw-parser-output

Change #1227852 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/core@master] Only wrap elements which are in sections

https://gerrit.wikimedia.org/r/1227852

Change #1227820 had a related patch set uploaded (by Rehan_khan_78; author: Rehan_khan_78):

[mediawiki/extensions/MobileFrontend@master] MobileFrontend: keep punctuation inline in headings for Parsoid

https://gerrit.wikimedia.org/r/1227820

Change #1227820 had a related patch set uploaded (by Rehan_khan_78; author: Rehan_khan_78):

[mediawiki/extensions/MobileFrontend@master] MobileFrontend: keep punctuation inline in headings for Parsoid

https://gerrit.wikimedia.org/r/1227820

cscott renamed this task from MobileFrontend + Parsoid has different HTML on en.wp main page to MobileFrontend + Parsoid has different collapsible-section HTML on en.wp main page.Jan 22 2026, 3:27 PM

Change #1227820 abandoned by Rehan_khan_78:

[mediawiki/extensions/MobileFrontend@master] MobileFrontend: keep punctuation inline in headings for Parsoid

Reason:

i will do again with new change-Id

https://gerrit.wikimedia.org/r/1227820

@cscott pointed out that this could be fixed with CSS targeting the h2. It could put this will require editors to adapt to Parsoid for mobile. Many of our desktop editors DO NOT test their content on mobile. This is a huge problem unfortunately. One wiki had a mobile site that was completely broken for 6 months due to a default gadget which made assumptions about the HTML and completely removed all article content on every page on page load.

While this is the most visible example we have identified I think it creates semantically questionably HTML on other pages that causes accessibility issues.

For example consider and compare https://en.wikipedia.org/wiki/Portal:Science?useparsoid=1 to https://en.wikipedia.org/wiki/Portal:Science?useparsoid=0
In Parsoid the wrapping of the heading in "Science portal" now incorrectly semantically marks the heading inside a section separating it from the body of the box it belongs to.
It also is marked with an aria-labelledby attribute so screen readers will treat these as landmarks.

There is no reason for these section tags inside content like this - they serve no purpose, so I think it is important we get out the way as soon as we detect that an editor is wrapping headings inside DIVs for reasons we cannot fully comprehend.

Does that make sense?

Also paging @Volker_E who may have some other thoughts on the accessibility side of the status quo and priority to fix.

Well, arguably the original <h2>Wikipedia</h2>, was somewhat questionable semantically as well. Probably the "right" things to do would be <h2>Welcome to Wikipedia,</h2> and there's not really any compelling reason I can see why the comma is being pulled out of the heading. Maybe <h2>Welcome to Wikipedia<span aria-hidden="true">,</span></h2>?

This is a corner case which, although very visible, does not appear on many pages and it doesn't seem wise to add too many corner cases to things like this when we could just fix the markup instead.

Pseudo-sections in general are caused by broken wikitext. Folks are being very clever on these portal pages, but there are other ways to write the wikitext if they want the semantic markup to be cleaner. Parsoid really isn't in the business of trying to guess what the proper semantics ought to be. The aria-labelled-by were added recently in T406897: Add aria-labelledby attribute to the section elements generated by Parsoid and can be tweaked if needed. But the semantic error here isn't the <section> wrapper, it is the <div> wrapper in the original wikitext:

<div>
== Science Portal ==
</div>
<div>
...contents of the body...
</div>

In wikitext, the <div> before ==Science Portal== belongs to a separate section, and there's no way to create a proper contain that unites the == Science Portal == with the ..content of body.. given this wikitext. My suggestion would be to remove the outer <div> wrapper around ==Science Portal== and instead use CSS rules targeting the actual <h2> here, which is a block element which can be styled identically to the <div>. If that is done, the <section> will correctly (and semantically) contain both the heading and its contents. You can see this in the legacy parser as well, which doesn't have any sort of aria labelling. tl;dr the accessibility issue here comes from the wikitext, which is trying hard to be pretty at the expense of accessibility. (And don't

I'm all for giving users better tools to make alternate page layouts: https://en.wikipedia.org/wiki/User:Cscott/Ideas/A_Dozen_Visions_for_Wikitext/Page_Description_Language

I agree on a philosophical layer but practically there is no way we are going to get editors to make these changes by breaking their stuff at scale. Editors will not fix these kind of problems and worse case will push back on Parsoid usage so I don't think we can ignore the problem, nor can we retain the status quo of inaccessible HTML.

What is your concern about https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1227852 ? My assumption was this was a low risk change given we do exactly the same logic in legacy parser. Given we added the section wrapping specifically for mobile, what concerns do you have around limiting this (and potentially reverting it later post-Parsoid rollout when we have the time and tooling to push editors to correct their semantic errors?) What am I missing?

I talked to @cscott today about this and it seems like we have a potential solution here. In the case of the Main page raw HTML (h1 tag) is being used so we should be able to detect this and avoid outputting the HTML here. I'll check in with @cscott in a few weeks time.

I'm still considering solutions though

Is this supposed to run for nested sections? I'm not sure how section collapsing is supposed to work but Parsoid's section wrapping spec allows for nested sections and maybe the real issue is to only add the div for the top level sections? That would also fix the issue on the main page here
https://github.com/wikimedia/mediawiki/blob/master/includes/OutputTransform/Stages/HandleParsoidSectionLinks.php#L261-L268

Change #1260770 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [OTP] Only top-level sections are collapsible

https://gerrit.wikimedia.org/r/1260770

Change #1260770 abandoned by Arlolra:

[mediawiki/core@master] [OTP] Only top-level sections are collapsible

https://gerrit.wikimedia.org/r/1260770

The main page is fixed.
The fix requires adding a class or ID to the raw headings as Arlo did here.

This has been documented here: https://www.mediawiki.org/wiki/Parsoid/Parser_Unification/Instructions_for_editors#Heading_parsing_differences

I've also proposed a lint to automate this if similar issues arise:
T421784: Consider lint for raw html headings

Change #1227852 abandoned by Jdlrobson:

[mediawiki/core@master] Only wrap elements which are in sections

https://gerrit.wikimedia.org/r/1227852