Develop a spec for representing a DOM range in serialized Parsoid output
Open, MediumPublic
Actions

Assigned To

None

Authored By

	ssastry
	Feb 17 2021, 10:01 PM

Description

There are a number of instances where Parsoid needs to represent a DOM range. The obvious cases that Parsoid has worked with so far have been output of templates and extensions. In the cases where the DOM range for a template and an extension overlap, there is a clear nesting (ex: extension output contains templates OR template output contains some extension) and in those cases, Parsoid has simply resorted to privileging the outer nest and suppressing information about the inner nested component.

Given this stragegy, Parsoid has used a typeof on the first element of the DOM range to indicate the type of DOM range it is (mw:Transclusion, mw:Extension/*) and an unique about id that is assigned to all the elements of the DOM range.

Going forward, we might have other use cases for DOM ranges (ex: annotations -- see T261181) and we might also want to have all DOM ranges be extractable rather than arbitrarily pick the outermost nesting.

So, we need a different spec that lets a DOM node be part of multiple ranges and of different types. So, we need a different representation scheme for encoding these ranges that is efficient space-wise, intuitive, and also lets clients easily extract the various DOM ranges and manipulate them in an error-free manner without a lot of complexity. So, given these requirements, the typeof-aboutid mechanism we have been using so far will not work.

We may also need to get feedback from existing Parsoid clients as part of developing this new spec.

Related Objects

Mentioned In: T359450: Parsoid is not adding headings to TOC entries in some templated content scenarios
T295171: Use data-mw.rangeId="t:...." instead of "about" for template ranges
T261181: Make Translate extension compatible with Parsoid
T214241: data-mw info is clobbered by template annotations
Mentioned Here: T214241: data-mw info is clobbered by template annotations
T261181: Make Translate extension compatible with Parsoid

Event Timeline

ssastry created this task.Feb 17 2021, 10:01 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 17 2021, 10:01 PM

ssastry added a parent task: T261181: Make Translate extension compatible with Parsoid.Feb 18 2021, 6:14 PM

cscott mentioned this in T214241: data-mw info is clobbered by template annotations.Feb 22 2021, 6:05 PM

https://phabricator.wikimedia.org/T214241#6849806 contains some earlier discussion, framed at the time as an issue of collapsing wrapper elements.

One big spec question to settle: are the ranges guaranteed to be complete DOM subtrees (or forests)? Or just contiguous nodes in an in-order traversal?

Using a pseudo-element <parsoid-wrapper> just for visualization, are we talking about:

<parsoid-wrapper typeof="mw:Translate">
<p> some text</p>
<table>....</table>
<div> ... </div>
</parsoid-wrapper>

or do we need to represent:

<div>
foo!
<parsoid-wrapper>
bar <b>bat</b>
</div>
<div>
baz
</parsoid-wrapper>
quux
</div>

A somewhat related question regards how non-element nodes like Text and Comment are marked, but the way we've been doing that is simply to add span wrappers when necessary. Ie:

<div>
foo <parsoid-wrapper>bar</parsoid-wrapper> bat
</div>

gets serialized as a "real" span wrapper:

<div>
foo <span ....>bar</span> bat
</div>

while

<parsoid-wrapper>
<div>
foo bar bat
</div>
</parsoid-wrapper>

gets either collapsed into the existing <div> or has a new wrapper element of the appropriate type (another <div> here) added.

We are looking at DOM forests, not a selection of contiguous nodes during inorder traversal. We don't want go down that other route - to handle cases like that for templates, we expand the DOM range to span a DOM forest.

As for non-element DOM nodes, there is no strong requirement to add / not-add span wrappers right now. During template wrapping, because of the specific solution we have made there for representing a DOM range, we add artificial span wrappers. But, for example, if we used an alternative representation (ex: meta-tags to start/end a range), we may not need span wrappers.

ssastry triaged this task as Medium priority.Feb 22 2021, 11:15 PM

Arlolra moved this task from Needs Triage to Future Ideas on the Parsoid board.Feb 24 2021, 5:41 PM

Esanders unsubscribed.Feb 28 2021, 3:48 PM

ssastry mentioned this in T261181: Make Translate extension compatible with Parsoid.Jun 7 2021, 3:55 PM

ExE-Boss subscribed.Jul 28 2021, 5:02 PM

ssastry updated the task description. (Show Details)Nov 5 2021, 4:24 PM

ssastry updated the task description. (Show Details)

ssastry mentioned this in T295171: Use data-mw.rangeId="t:...." instead of "about" for template ranges.

ssastry removed a parent task: T261181: Make Translate extension compatible with Parsoid.Apr 20 2022, 1:05 PM

ssastry mentioned this in T359450: Parsoid is not adding headings to TOC entries in some templated content scenarios.Mar 6 2024, 9:00 PM