Page MenuHomePhabricator

Lead paragraph is not hoisted in new Parsoid HTML
Open, Stalled, HighPublicBUG REPORT

Description

Background

With the new parsoid experience enabled, the lead paragraph of the article is not correctly being displayed before the infobox, as it is with the legacy parser:

Screenshot_20240303_090522_Chrome.jpg (2×1 px, 554 KB)

Infobox should be below first paragraph
Screenshot_20240303_090613_Chrome.jpg (2×1 px, 635 KB)

(this is from https://en.m.wikipedia.org/wiki/SS_Kroonland with new parser HTML)

User story

As a reader of the wikis, I would like to be able to see the lead paragraph displayed before any infobox content, so that I can better understand the article even with limited screen space

Requirements

  • lead paragraph should be correctly displayed before infobox content with parsoid enabled
  • lead paragraph hoisting should continue to work with the legacy parser enabled

Rollback plan

I don't imagine that this will result in any data being saved or otherwise - I'm not entirely sure how caching works in this case, so there is a chance if a transform is being applied that it will be cached and if we need to roll back that we would also need to reason about the cached content

Note

We should explore the approach in https://phabricator.wikimedia.org/T262093 and see if we can learn anything from that experience.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jdlrobson moved this task from Backlog to Q4 (April-July) on the Web Team Essential Work 2024 board.
Jdlrobson lowered the priority of this task from High to Medium.Apr 3 2024, 6:26 AM
Jdlrobson raised the priority of this task from Medium to High.May 13 2024, 5:51 PM

Could we use CSS ordering to re-order the lead paragraph? If not, what would the modern way be for hoisting the lead paragraph?

SToyofuku-WMF changed the task status from Open to Stalled.May 23 2024, 5:58 PM
SToyofuku-WMF claimed this task.
SToyofuku-WMF subscribed.

This doesn't feel ready for estimation yet - we could potentially benefit from a spike or breaking this down into multiple tasks to better understand what it would entail (and the best possible approach)

Could we use CSS ordering to re-order the lead paragraph? If not, what would the modern way be for hoisting the lead paragraph?

CSS order applies only inside a flex container (or a grid). I don't think it makes sense to make .mw-parser-output a flex container (and definitely not a grid), and even if you did you would have to guarantee that .infobox would appear as one of its direct children (which is a hard guarantee to make, but not impossible - the current transforms do some work to handle that). Users don't have the power to make this ordering either (TemplateStyles cannot target .mw-parser-output though I did file T271588: Add .mw-parser-output as a potential selector at one point, which has some other thoughts on what might happen with .mw-parser-output as flex container), so you would in perpetuity be carrying along this bit of technical debt (though in CSS rather than in the current transform). And you would have opened up a fun new CSS vandalism vector.

So then the next bit of thinking about it is with Parsoid's <section>ing, the lead section could be made a flex container I guess without too much pain. You still open the CSS vandalism vector this way, but it exists regardless due to TemplateStyles being able to touch the sections. Floating boxes lose their floatiness when they're a flex child, which is probably fine at the relevant resolutions to this problem, since this would just be in a media (possibly container) width query. And either way in this context you can even drop your own support for the flip-flop eventually since we can target the sections (and even the specific lead section I think?) in TemplateStyles.

Then you still have a problem, there are accessibility implications to a CSS reorder (which incidentally have led me to "when do I even use the property ever?!"). I do think in this context those are relevant questions. Someone screen readering on their phone is going to jump down to the infobox, up to the first paragraph, then jump to the second paragraph, which is not a great reading flow.

Other comments for other solution: A more holistic solution to the problem, like making infoboxes their own revision slot. (In any other software system, making the infobox not inline with the content would be the solution, that's just not one you have available today.) That would enable other uses like pulling it out of the page text entirely for wide screens (a la the Winter prototype). You might still need some transformations to get it to be friendly with the primary page content but you would have the full support of logic in the skin to do it. (Not sure if that's a plus or minus.)

But that's a lot more work in a lot more ways than just leaving a sleeping dog to lie.... I can't see another "modern" way which isn't what you've already got.

(I swear I've made some version of this comment already somewhere.)

Change #1038922 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[integration/visualdiff@master] Couple tweaks to the MFE config

https://gerrit.wikimedia.org/r/1038922

Change #1038922 merged by jenkins-bot:

[integration/visualdiff@master] Couple tweaks to the MFE config

https://gerrit.wikimedia.org/r/1038922

Addressing this issue is high-ish priority for CTT because we find that inconsistent treatment of the lead paragraph between legacy and Parsoid tends to have knock-on effects with positioning that then throw off visualdiff results all the way down the page. So basically we find it hard to discover any /other/ issues with the MFE+Parsoid rendering because the noise from the lead section drowns out everything else.

The patch by @ssastry's ( https://gerrit.wikimedia.org/r/1038922 ) tries to recreate the lead paragraph stuff within the visualdiff framework just to get our output closer together and minimize noise, but it doesn't really work well enough for us to proceed with visualdiff testing past that point. A better solution that ensured legacy and Parsoid matched would be very helpful to us.

Web team still needs to work out how to do this - there have been calls to change how this is implemented in T262093, and we want to reduce the amount of expensive HTML parsing we do on page views, so we think it is important to evaluate options as part of T366391 first, before committing to a solution.

For what it's worth, the new OutputTransform pipeline will (hopefully) maintain the document in DOM form throughout the transformation pipeline, eliminating the repeated HTML parsing steps. That's T347062: Create HtmlHolder interface.