Page MenuHomePhabricator

Decide whether we're going to re-use an existing tree-diffing system or build our own
Closed, ResolvedPublic

Description

  • Wikibase has one, can we use that?
  • Are there others already in production we can use instead?
  • Is there a better one as a library we can easily re-use and get through Security and Performance reviews?
  • Should we just build our own?

Related Objects

StatusSubtypeAssignedTask
OpenBUG REPORTNone
OpenNone
ResolvedBTullis
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedDVrandecic
Resolvedgengh
Resolvedgengh
Resolvedgengh
Resolvedgengh
Resolvedgengh
Resolvedgengh
OpenNone
OpenNone
OpenNone
Resolvedgengh
OpenNone
Resolvedgengh
Resolvedgengh

Event Timeline

Here's the tree differ we use in VisualEditor: https://github.com/Tchanders/treeDiffer.js

It's pretty generic, with an abstract tree node class which you extend to define conditions for equality etc.

At the moment it's for use in the browser (attaches an object to the window), but it could be easily adapted to a node module.

What's your use case?

Here's the tree differ we use in VisualEditor: https://github.com/Tchanders/treeDiffer.js

It's pretty generic, with an abstract tree node class which you extend to define conditions for equality etc.

At the moment it's for use in the browser (attaches an object to the window), but it could be easily adapted to a node module.

What's your use case?

It's for three purposes:

  • analysing proposed new JSON objects to determine whether they're authorised to be made (different users can edit different parts of the objects), determining whether or not to accept the user's publish request;
  • producing nice HTML comparing revisions of published JSON objects for the normal accountability/collaboration/anti-abuse MW features; and
  • automatically tagging edits for wider interest, e.g. adding a label-fr for edits that touch labels in French.

I think we need a differ in PHP for our use cases, sadly; do you think yours would be reasonable to port, or should we look elsewhere?

I think we need a differ in PHP for our use cases, sadly; do you think yours would be reasonable to port, or should we look elsewhere?

It's pretty lightweight and generic, so not a bad place to start. Obviously it only deals with the actual diffing part.

Straight-up tree diffing can be quite performance-heavy. For the visual diff, although each document is an HTML tree and could theoretically be diffed as such in its entirety, comparing every possible sub-tree is both costly and easily avoidable using some shortcuts and heuristics. (E.g. finding chunks of identical content to treat as one node, or modelling parts as a list instead, etc.)

Do your JSON objects have a spec?