Page MenuHomePhabricator

CX duplicates section segmentation logic
Open, MediumPublic

Description

Parsoid segments documents into "mw sections" (i.e. heading-to-heading sections, as provided by edit-section links) as defined in the spec: https://www.mediawiki.org/wiki/Specs/HTML/2.1.0#Headings_and_Sections

When this was first implemented CX had no use for these so the tags were stripped: https://gerrit.wikimedia.org/r/#/c/mediawiki/services/cxserver/+/383329/

Now CX is trying to support section editing, but is doing so by re-implement heading-to-heading section markings using class names: https://gerrit.wikimedia.org/r/#/c/mediawiki/services/cxserver/+/548589/10/lib/lineardoc/Doc.js

A much cleaner approach would be to restore this Parsoid sections and use them to implement section editing, in the same way the full page article editor does.

Event Timeline

This task is a follow-up to T234323:

server-side section editing will give you less performance gains than you might think (if you are using attachedRoot instead): T206228#5330185. Parsoid HTML download is usually not a bottleneck, and more time is spent building and rendering the CE tree than the DM.

Does attachedRoot allow to use multiple adjacent <section>s or we need to wrap those which define a range between two <h2> headers?

We still haven't decided on a level of granularity we want for section translation, but for initial exploration, I went with a larger set.

I don't think I explained this clearly in my first reply. The Parsoid DOM spec wraps all editable sections in <section> tags. An editable section starts with any <hN> tag and continues until the next heading of the same or lower level. That means every h2-h2 section will have a <section> tag wrapping it and for most articles this will just be the root level <section> tags (edge case would be if an <h1> was used, or an <h3> without a preceding <h2>).

Using these root level <section> tags will give you exactly the ranges you require and they can be fed into the attachedRoot of VE.