TL;DR: Introduce the concept of a tree diff. Use it to break TreeModifier in half: one half generates a tree diff from a linear diff; the other half applies the tree diff to the document.
- improved code readability
- faster debugging
- safer error handling
- new possibilities, e.g.
- improved transaction processing
- improved crash recovery
- improved diffing / merging
- merging new source content within ContentTranslation
Details below.
Current situation
TreeModifier was written to fix buggy behaviour, where a document's linear representation would get temporarily out of sync with its node tree representation. A ve.dm.Transaction contains linear operations to modify ve.dm.LinearData. Originally, we would modify the linear data first then rebuild part of the node tree. This could cause bugs because events could be fired by rebuilding nodes, at which point parts of the node tree could be inconsistent with the linear data.
Now, ve.dm.TreeModifier applies the linear operations to both the linear data and the ve.dm.Document tree simultaneously, keeping the two in sync at every step. This ensures document consistency when event handlers are called by those steps, hence fixing the bugs.
TreeModifier is made possible by using the algorithm documented in T162762, but it can be hard to understand what that means for a particular linear transaction, even as the main author of that algorithm. It would be difficult for anyone else to maintain the code. When possible issues arise, stepping through TreeModifier with the debugger is quite time consuming and difficult.
Proposed tree diff format
Instead of performing the steps documented in T162762, the new TreeModifier will list the steps to take. Each step is a tree operation, and the list of tree operations forms a tree diff.
Each tree operation can take one of the following six forms:
{ type: 'insertNode', isContent: <boolean>, at: <Path>, element: <OpenElementLinModItem> } { type: 'removeNode', isContent: <boolean>, at: <Path>, element: <OpenElementLinModItem> } { type: 'moveNode', isContent: <boolean>, from: <Path>, to: <Path> } { type: 'insertText', isContent: true, at: <Path>, data: <TextLinModItem[]> } { type: 'removeText', isContent: true, at: <Path>, data: <TextLinModItem[]> } { type: 'moveText', isContent: true, from: <Path>, to: <Path>, length: <number> }
Note that moveNode/moveText do not specify what content is being moved, and all the *Node operations always operate on a single node at a time.
<Path> is a number[] array, representing the tree path from the DocumentNode to the position, except that within ContentBranchNodes, the offset is the linearized offset of the position.
<OpenElementLinModItem> is the linear model value representing the node being inserted or removed, like { type: 'paragraph' } .
<TextLinModItem[]> is the linear model values representing the text, like [ 'y', 'o', 'u', ' ', [ 'm', [ 'he4e7c54e2204d10b ] ], [ 'e' 'he4e7c54e2204d10b ] ] .
The isContent flag is true if the operation is taking place inside a ContentBranchNode (so it is always true for text).
Immediate benefits
- Makes TreeModifier self-documenting. Unit tests can show directly how linear operations should transform into tree operations.
- Debugging aid. A glance at a tree diff will usually show whether, and suggest how, TreeModifier is going wrong.
- Safer error handling: buggy transactions are rejected before the document is modified
Future benefits
- Greater flexibility for transaction processing
- Identification of edit type (e.g. "this is a formatting change"):
- Improved crash recovery
- Improved diffing (simplification of VisualDiff code)
- Improved merging
- Merging new source content within ContentTranslation