Page MenuHomePhabricator

Augment revisions with edit distance
Closed, DuplicatePublicFeature

Description

The revisions should be augmented with edit distance[1], and this should be shown in history, recent changes and watchlist instead or in addition to the length.

The increment/decrement of length is shown in the history and other places, but this isn't really useful as a lot of changes can be done without the length changing very much. In particular copy-pasting the same phrase within text itself leads to a zero change.

There is also the cases where some expensive processing should only be done for revisions that are more than minimally interesting. This is very often connected to edit distance or can use edit distance as a proxy for such calculations.

Edit distance is usually interpreted as the Levenshtein distance, but other can also be used. In particular it is worth noting that some of the LD algorithms is computationally expensive. There could be other algorithms that is good enough (ie they can be used as a proxy) and is computationally less expensive (ie constant or linear order).

If the revision table is augmented with edit distance the old entries should be updated to. That is the edit distance between a revision and its parent revision should be calculated and stored.

[1] https://en.wikipedia.org/wiki/Edit_distance


Version: 1.22.0
Severity: enhancement

Details

Reference
bz51506