The talk page consultation team would like a robust mechanism for inserting and editing comments. This is closely related to the "selective serialization" functionality of Parsoid -- in both cases we have some DOM ranges which want to be identical copies of the original wikitext, and some other DOM ranges which are edited/inserted, and we need to glue the two together.
Context
DiscussionTools are parsing an HTML DOM to extract the comment structure for the page -- basically looking at list items and nesting and signature blocks. The expectation is that this algorithm works the same regardless of whether the DOM is generated by the legacy parser or Parsoid -- Parsoid may have some additional <span> tags and metadata, but the <li> structure will be consistent. For purpose of discussion, assume that comments are associated with a DOM range and there's a reasonable way to identify that DOM range in both legacy read-view HTML and Parsoid HTML, and to communicate the range to an API. T230659 discusses this aspect in more depth, but for this task assume it all just works.
The discussion tools team would like to be able to author and edit comments in both "source" (wikitext) and "visual" (HTML DOM) modes. When doing source editing, the expectation is that all of the original author's wikitext is preserved to the largest degree possible, modulo any changes at the edges needed to merge this into the rest of the page.
A brief word on DOM ranges
Note that a typical talk page may look something like this:
<p>A comment</p> <ul><li id=A>Response line 1</li> <li id=B>Response line 2</li> <li id=C>Response line 3 <ul id=D><li>Child response</li></ul> </li> </ul>
If we are to remove or replace the response (lines 1, 2, and 3) we are talking about a DOM tree range which I'll express as [#A, #D) -- that is, it includes all of the nodes traversed in order starting at #A and ending at (and not including) #D. So, nodes #A, #B, and all but the last child of #C's children (in particular, including the text node with the contents Response line 3). This is not a DOM subtree or forest, exactly. You could probably get away with an interface based on a forest, but you'd end up having to provide #D (and all its grand*children) in your edit to #C.
This is all moot so long as we are doing the DOM manipulation entirely on the client side, but if we want to provide "mutation instructions" to the server, then we need to decide what exactly those instructions look like... and that probably means we need to support naming ranges, not just subtrees.
That's a good segue to the discussion of alternatives.
Where does mutation occur?
First decision to make is where the new content is combined with the existing document, on the client or on the server.
- Client side. DiscussionTools fetches the entire Parsoid DOM, does all of the DOM manipulation itself on the client, then sends back a complete DOM for the talk page to be selser'ed and saved, like VisualEditor does. No new APIs needed, but client side demands (bandwidth, memory, processing) are high.
- Server side. For comment insertion, DiscussionTools never needs the complete Parsoid DOM. It identifies the insertion point based on the read-view HTML, and sends back just "insertion point, content-to-add" to the API. Parsoid fetches the original Parsoid DOM and performs the mutation server-side.
- For comment editing, DiscussionTools needs the Parsoid DOM for the particular comment it is editing. Ideally it could fetch just this subtree of the Parsoid DOM, but this is largely an orthogonal issue. Initially it could fetch the entire page DOM and extract the portion it needs client-side.
- "Content to add" could be either HTML or wikitext. If the API supported only one of these, DiscussionTools needs to do a conversion.
- How should the mutation be specified? Ideally the format would be general enough to accommodate "section editing" in VisualEditor and future use-cases like paragraph-level editing on mobile devices as well as the specific "DOM range edit" needed for DiscussionTools. Probably an array of replace instructions ("delete this DOM range (perhaps zero-length), then insert this content (perhaps empty) at that location") is sufficient?
Supporting "edit as source"
Second decision is how to represent the wikitext when the DiscussionTools user is using "source editing" and inserting a wikitext comment / editing an existing comment as wikitext.
- Wikitext is passed directly to the mutation API, and we handle splicing it into the page directly using the ConstrainedText etc frameworks. Drawback: hard to properly balance the content, adds a new mode to selser which is probably not well tested, etc.
- HTML is passed to the mutation API, but we ensure that it selsers cleanly to the original source. That is, we need to ensure that selser works even when different parts of the DOM Tree have different "frame" documents; the new content comes from a different frame so that it is treated as unedited in that frame and selsers to the original contents exactly. (But we can still use the HTML structure of the new content to guide gluing the wikitext together, as we usually do with edited content.)
- We use a "subst" hack. Rather than try to represent a separate frame for edited content, handle this the way VisualEditor handles template insertion. AIUI, to insert the template {{foo|bar}} VE asks Parsoid to parse and render {{foo|bar}} into HTML, which results in something like:
<span typeof="mw:Transclusion" data-mw= '{"parts": [{"template":{"target":{"wt":"foo","href":"./Template:Foo"},"params":{"1":{"wt":"bar"},"i":0}}]}'> ... </span>
That is, fundamentally the parameter bar is expressed directly in wikitext inside the data-mw attribute, and Parsoid uses this to render {{foo|bar}}, ignoring the actual contents of the <span>.
So in this version, DiscussionTools would send back something like:
<span typeof="mw:Transclusion" data-mw= '{"parts": [{"template":{"target":{"wt":"subst","href":"./Template:Subst"},"params":{"1":{"wt":"...the full content of the comment..."},"i":0}}]}'> ... </span>
and Parsoid would special-case this to remove the apparent {{subst}} wrapper and just insert the wikitext source content directly into the page at this point. The advantage is that the wikitext never has to be parsed into HTML. The disadvantage is that because the wikitext was never parsed into HTML, selser doesn't have a DOM tree to guide it as it splices together the wikitext. On the other hand, Parsoid could do the parse itself as part of the serialization of {{subst}} and that avoids the need for a round-trip through the client.