Page MenuHomePhabricator

ve.dm.TransactionProcessor: Restructure DM tree branches using existing nodes, instead of rebuilding them afresh
Closed, ResolvedPublic40 Story Points

Description

Currently, ve.dm.TransactionProcessor modifies DM tree structure by rebuilding tree branches afresh (i.e. calling ve.dm.Document#rebuildNodes on the parts of the tree that it determines need to change).

It would be nicer for many purposes to restructure the tree in a way that preserves existing nodes where possible (whereby, for instance, removing a surrounding <div>...</div> grafted its erstwhile child node objects into their erstwhile grandparent without recreating them).

However it may cause breakage where existing code assumes DM/CE node ancestry will not change.

The tree restructuring can be done with a single pass through the transaction operations, and two pointers that walk the tree in document order, one inserting nodes and the other removing them. I'll add further details.

Event Timeline

dchan created this task.Apr 12 2017, 2:16 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2017, 2:16 AM
dchan added a comment.Apr 12 2017, 2:23 AM

Processing transactions - two pointer method for tree modification (Thanks to @Catrope for input on this)

A method to apply a linear model transaction to the model tree. The aim is to modify the tree with a method that is simple to write, debug and reason about. We want to preserve existing node objects where practical so we don’t end up rebuilding whole new CE/DOM nodes. This also means we want primitive operations to reparent nodes without destroying/recreating them. (The reparent event could be emitted on the common ancestor of the donor/recipient nodes, with offset paths to them both; or it could be emitted on the recipient node with an ancestor+offset path to the donor).

Basic concept

Two pointers, the “inserter” and “remover”, start at the beginning of the tree (just inside the document node) and move forward in document order with the tx operations.

Each pointer represents a (node, offset) pair, with the node referenced as an index path from the document node. (So { path: [ 3, 4 ], offset: 0 } represents the zeroth offset inside documentNode.children[3].children[4]).

As the inserter advances, it inserts content. Created nodes are marked as newly inserted.
As the remover advances (skipping over newly inserted nodes), it deletes content.
Retaining content is implemented as reparenting nodes from the remover position to the inserter position.

Either pointer can be in front of the other, depending on the situation. There are important restrictions on the relative positions they can take.

Advancing a pointer n offsets

Stepping into/out of a (structural) node consumes one offset. Stepping across a node consumes its length. (Assume for simplicity that text is stored as a sequence of one-character nodes). For efficiency and cleanliness, step across nodes where possible; e.g. if we need to advance 40 offsets, and the next node has length 16, we can step across it leaving 24 more offsets to advance.

To remove n items

Advance the remover n offsets, but stepping over any node marked as newly inserted and treating it as length zero. Delete any (non-newly inserted) node it steps across. Mark as potentially removable any node it steps into. XXX Delete any empty node it steps out of that is marked as potentially removable.

If the inserter is ahead of the remover then fix up its offsets to reflect any deletions within nodes in which it lies.

To retain n items

If the remover and the inserter are at the same offset, then advance both in sync.

Else the remover and the inserter do not have the same current node (by lemma 1). Advance the remover n offsets, but stepping over any node marked as newly inserted and treating it as length zero. Reparent any (non-newly inserted) node it steps across into the offset at the inserter (and advance the inserter past it). For any node the remover steps into, mark the node as potentially removable and create the same type of node at the inserter, marked as newly inserted.

For any node it steps out of, delete the node if marked as potentially removable, and step the inserter out of its node. (This may cause the remover and the inserter to be at the same offset, in which case advance both in sync from here on).

Whichever pointer is ahead needs its offsets adjusting to reflect changes made at the other pointer.

To insert an item

For an open tag, insert the required type of node at the inserter, mark it as newly inserted, and move the inserter inside of it. For a character (or potentially a whole newly inserted balanced tree branch) splice it into inserter.node.children at inserter.offset. (If the remover is ahead of the inserter then fix up its offsets to reflect any deletions within nodes in which it lies.)

For a close tag, advance the inserter out of the current node. This causes any remaining children unvisited by the inserter to be “orphaned”. This can only happen if the remover lies within the node at the offset the inserter has just left (or deeper within the node at that offset) (XXX give proof). These orphans will later be consumed, either by a remove (which deletes them entirely) or by a retain (which moves them into another node), and each outstanding orphan will be consumed in document order (as the remover passes them).

Lemmas

Let the inserter position be { path: [ i_1, i_2, …, i_n ], offset: i }
Let the remover position be { path: [ r_1, r_2, …, r_m ], offset: r }
Let k be the max value such that i_1...i_k = r_1...r_k.

Lemma 1: If the inserter and remover are not at the same position (XXX disregarding newly inserted nodes, i.e. offset distance), then neither is a sibling of the other. Proof: if the remover is further forward, then it has stepped over the intervening sibling nodes (or into then out of them) and therefore the nodes were deleted, a contradiction. But if the inserter is further forwards, then it has newly inserted the intervening sibling nodes.

Lemma 2: If the remover is before the inserter, then the children at each depth d > k+1 after r_d are precisely all the orphans. Any node lying entirely between [ r_1, …, r_k, r_(k+1) ] and the inserter position is newly inserted.

Lemma 3: Any node that lies entirely between the inserter and the remover is either newly created or an orphan. (Deleted nodes get removed entirely).

Change 348450 had a related patch set uploaded (by Divec):
[VisualEditor/VisualEditor@master] WIP: Restructure DM tree instead of rebuilding it afresh

https://gerrit.wikimedia.org/r/348450

Jdforrester-WMF triaged this task as Normal priority.
Jdforrester-WMF set the point value for this task to 8.
Jdforrester-WMF moved this task from To Triage to TR3: Language support on the VisualEditor board.
Jdforrester-WMF closed this task as Resolved.May 19 2017, 10:31 AM
Jdforrester-WMF removed a project: Patch-For-Review.
Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptMay 19 2017, 10:31 AM

Change 348450 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] TransactionProcessor: modify DM tree branches instead of rebuilding them

https://gerrit.wikimedia.org/r/348450

Change 354523 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (2db069f39)

https://gerrit.wikimedia.org/r/354523

Change 354523 merged by Jforrester:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (ee59e9d33)

https://gerrit.wikimedia.org/r/354523

Jdforrester-WMF reopened this task as Open.May 23 2017, 8:43 AM
Jdforrester-WMF changed the point value for this task from 8 to 40.
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Sadly.

Change 357171 had a related patch set uploaded (by Jforrester; owner: Divec):
[VisualEditor/VisualEditor@master] TransactionProcessor: modify DM tree branches instead of rebuilding them

https://gerrit.wikimedia.org/r/357171

Change 357171 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] TransactionProcessor: modify DM tree branches instead of rebuilding them

https://gerrit.wikimedia.org/r/357171

Change 376396 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (834fd702f)

https://gerrit.wikimedia.org/r/376396

Change 376396 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (834fd702f)

https://gerrit.wikimedia.org/r/376396

Phab was down so the bot didn't tag this. Dan, Resolved?

Deskana closed this task as Resolved.Sep 7 2017, 10:59 AM
Deskana added a subscriber: Deskana.

Should be!