Page MenuHomePhabricator

<section> wrappers in Parsoid output can interfere with CSS rules that use child (direct descendent) selectors
Open, MediumPublic

Description

I have run into this a couple times in visual diff testing and the latest instance is this diff on enwikitionary. While there are other reasons that this rule doesn't apply to Parsoid HTML, the ".mw-parser-output > h2" bit of the selector will fail on Parsoid output because of intervening <section> tags.

What is the best way to handle this?

If we plan to strip section wrappers for read views, this will get resolved naturally. Otherwise, we'll need to communicate and/or tweak CSS rules on wikis ourselves. This can be a bit cumbersome.

Event Timeline

Considering user space, it's only about 500 results currently, without it's about 100. I think the latter is a tolerable count just to clean the selectors regardless of whether <section>s are available in read mode (IMO they should be because of their other benefits like being able to remove these selectors e.g. section > h2? :), but that seems like a different question for a different task). ~20 are Twinkle copies which could be fixed once and then have communities re-import. A task for that can be filed at Github today. There might be a handful of JS pages which are using native JS to select elements rather than jQuery, but those will pop up in post-deploy situations and/or User-notice .

A couple cases of the pattern in code search.

My personal experience (example) is that the <section> wrappers are actually quite nice and allow for simpler scripting and selectors.

If we plan to strip section wrappers for read views, this will get resolved naturally.

Is this just an idea from this task or is there a separate proposal for this? We already have the other task (T332243) where people are expressing how useful it is to have structured section information in their scripts (which I've loved Parsoid for having), so removing it seems like a step backwards - or at least missing an opportunity to move forwards :/

Yeah, it's actually confusing to me that's being considered considering T114072: <section> tags for MediaWiki sections got done... (even though old parser does not have them [yet/ever] T8104 which even mentions intent from cscott a year ago to provide sections in Parsoid read mode).

I should have written a better description -- I'll update it later. But, the summary is that we won't remove section tags from Parsoid's canonical HTML that is stored in the Parser Cache and which will be available via APIs. But, HTML served for read views might have some post-processing to reduce HTML size. See https://www.mediawiki.org/wiki/Parsoid/Parser_Unification/Performance and T272331: Evaluate and recommend strategies for ensuring Parsoid HTML payload doesn't degrade performance in low-resource contexts..

We aren't yet at the place where we are ready to implement anything. The first focus on performance will be latency and memory, but we cannot serve the canonical Parsoid HTML as is today for read views. At the very least, we will strip data-parsoid. T272331 has a list of other things we considered stripping.

All that said, section tags might survive the cut since they probably won't do much to reduce HTMlL size. But, since this topic came up, I figured I would point you to the wiki page and the phab task.

ssastry triaged this task as Medium priority.Apr 6 2023, 2:26 PM

Change 974275 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/extensions/DiscussionTools@master] Tweak CSS to deal with Parsoid's <section> tags

https://gerrit.wikimedia.org/r/974275

There's a /looong/ standing issue to *add* section tags to HTML output, which I can't find at the moment but it's like a 4 digit bug number. Also T300467: Expose section identifier in HTML output -- so I'm also hoping <section> tags make the cut.

Change 974275 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Tweak CSS to deal with Parsoid's <section> tags

https://gerrit.wikimedia.org/r/974275

There's a /looong/ standing issue to *add* section tags to HTML output, which I can't find at the moment but it's like a 4 digit bug number. Also T300467: Expose section identifier in HTML output -- so I'm also hoping <section> tags make the cut.

And that task is worth keeping in mind when you diverge what's stored in the Parser Cache from what is served to clients: user scripts and Gadgets mostly operate on what they can find in the DOM client-side. Fetching anything from the API is overhead, and re-fetching the HTML from the API just means fetching the same thing twice. Things like the section identifiers are prime candidates for default Gadgets operating on every single page view. There are very likely other things that would have a similar effect if removed from the client-side DOM.