Page MenuHomePhabricator

Split TreeModifier into a tree diff generator and tree diff applier
Closed, ResolvedPublicSep 4 2019

Description

TL;DR: Introduce the concept of a tree diff. Use it to break TreeModifier in half: one half generates a tree diff from a linear diff; the other half applies the tree diff to the document.

  • improved code readability
  • faster debugging
  • safer error handling
  • new possibilities, e.g.
    • improved transaction processing
    • improved crash recovery
    • improved diffing / merging
    • merging new source content within ContentTranslation

Details below.

Current situation

TreeModifier was written to fix buggy behaviour, where a document's linear representation would get temporarily out of sync with its node tree representation. A ve.dm.Transaction contains linear operations to modify ve.dm.LinearData. Originally, we would modify the linear data first then rebuild part of the node tree. This could cause bugs because events could be fired by rebuilding nodes, at which point parts of the node tree could be inconsistent with the linear data.

Now, ve.dm.TreeModifier applies the linear operations to both the linear data and the ve.dm.Document tree simultaneously, keeping the two in sync at every step. This ensures document consistency when event handlers are called by those steps, hence fixing the bugs.

TreeModifier is made possible by using the algorithm documented in T162762, but it can be hard to understand what that means for a particular linear transaction, even as the main author of that algorithm. It would be difficult for anyone else to maintain the code. When possible issues arise, stepping through TreeModifier with the debugger is quite time consuming and difficult.

Proposed tree diff format

Instead of performing the steps documented in T162762, the new TreeModifier will list the steps to take. Each step is a tree operation, and the list of tree operations forms a tree diff.

Each tree operation can take one of the following six forms:

{ type: 'insertNode', isContent: <boolean>, at: <Path>, element: <OpenElementLinModItem> }
{ type: 'removeNode', isContent: <boolean>, at: <Path>, element: <OpenElementLinModItem> }
{ type: 'moveNode', isContent: <boolean>, from: <Path>, to: <Path> }
{ type: 'insertText', isContent: true, at: <Path>, data: <TextLinModItem[]> }
{ type: 'removeText', isContent: true, at: <Path>, data: <TextLinModItem[]> }
{ type: 'moveText', isContent: true, from: <Path>, to: <Path>, length: <number> }

Note that moveNode/moveText do not specify what content is being moved, and all the *Node operations always operate on a single node at a time.

<Path> is a number[] array, representing the tree path from the DocumentNode to the position, except that within ContentBranchNodes, the offset is the linearized offset of the position.

<OpenElementLinModItem> is the linear model value representing the node being inserted or removed, like { type: 'paragraph' } .

<TextLinModItem[]> is the linear model values representing the text, like [ 'y', 'o', 'u', ' ', [ 'm', [ 'he4e7c54e2204d10b ] ], [ 'e' 'he4e7c54e2204d10b ] ] .

The isContent flag is true if the operation is taking place inside a ContentBranchNode (so it is always true for text).

Immediate benefits

  • Makes TreeModifier self-documenting. Unit tests can show directly how linear operations should transform into tree operations.
  • Debugging aid. A glance at a tree diff will usually show whether, and suggest how, TreeModifier is going wrong.
  • Safer error handling: buggy transactions are rejected before the document is modified

Future benefits

  • Greater flexibility for transaction processing
  • Identification of edit type (e.g. "this is a formatting change"):
  • Improved crash recovery
  • Improved diffing (simplification of VisualDiff code)
  • Improved merging
  • Merging new source content within ContentTranslation

Event Timeline

except that within ContentBranchNodes, the offset is the linearized offset of the position.

I assume this means that, given linmod [ {type: 'paragraph' }, 'a', 'b', 'c', { type: 'image' }, { type: '/image' }, 'x', 'y', 'z', { type: '/paragraph' } ], the path of the offset inside the image is [0, 4], whereas it would normally be [0, 1, 0]? I think this is a good idea, and one reason for that is that otherwise it's ambiguous whether the path of the offset after 'c' is [0, 0, 3] or [0, 1].

Oh, good point – yes, that is correct.

Change 473380 had a related patch set uploaded (by Divec; owner: Divec):
[VisualEditor/VisualEditor@master] WIP: Split TreeModifier into a tree diff generator and tree diff applier

https://gerrit.wikimedia.org/r/473380

I think this is looking more or less in shape. How about we work out a fairly quiet time to merge it? I ask because, although in theory the behaviour should be unchanged, in practice the internals are fairly radically new, so it would be good to have capacity for reactive QA / debugging if required.

JTannerWMF added a subscriber: JTannerWMF.

We will work on this in the beginning of September

JTannerWMF set Due Date to Sep 10 2019, 4:00 AM.Aug 20 2019, 3:19 PM
Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptAug 20 2019, 3:19 PM

I am moving this to Up Next on our board, I think we will be in the clear to get this deployed by second week of September.

JTannerWMF changed Due Date from Sep 10 2019, 4:00 AM to Sep 4 2019, 4:00 AM.Aug 21 2019, 4:55 PM

This will go out on Sept 10th but merged on the 4th of Sept for Testing

Change 473380 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] Split TreeModifier into a tree diff generator and tree diff applier

https://gerrit.wikimedia.org/r/473380

Change 536385 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (4031ac8ef)

https://gerrit.wikimedia.org/r/536385

Change 537178 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (a9d533280)

https://gerrit.wikimedia.org/r/537178

Change 537178 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (81e2e57a5)

https://gerrit.wikimedia.org/r/537178

matmarex removed a project: Editing QA.
matmarex moved this task from QA to Product owner review on the VisualEditor (Current work) board.

(all the found issues are resolved)