Page MenuHomePhabricator

Show changes in moved text chunks (C++)
Closed, ResolvedPublic

Description

Motivation:
Currently of diff pages, moved paragraphs are considered a deletion and a subsequent addition. This leads to the situation that people always need to read through all moved paragraphs to make sure, that nothing has changed apart from the position.

Task:
Using the currently existing display methods, show changes in moved text chunks. This means, that only the part that changed is highlighted.

Important:
The diff engine used on mediawiki production wikis is wikidiff2. This is a c++ library, so the code needs to be written in C++.

Background:
This is part of fulfilling wish #2 of the 2015 German-speaking community wishlist survey: T139603: Show text changes when moving text chunks (#2)

Related Objects

StatusSubtypeAssignedTask
Resolved DannyH
Resolvedthiemowmde
ResolvedNone
ResolvedWMDE-Fisch
ResolvedTobi_WMDE_SW
Resolved jkroll
ResolvedTobi_WMDE_SW
Resolved jkroll
ResolvedPaladox
ResolvedAddshore
ResolvedAddshore
ResolvedTobi_WMDE_SW
ResolvedWMDE-Fisch
Resolved jkroll
ResolvedLegoktm
Resolved jkroll
ResolvedNone
Resolved jkroll
ResolvedAndrew-WMDE
Resolved jkroll
Resolved Lea_WMDE
ResolvedNone
Resolved jkroll
ResolvedWMDE-Fisch
Resolved jkroll
Resolved jkroll
ResolvedWMDE-Fisch
Resolved jkroll
Resolvedovasileva
Resolved jkroll
ResolvedWMDE-Fisch
ResolvedMoritzMuehlenhoff
ResolvedWMDE-Fisch
ResolvedMoritzMuehlenhoff
Resolved Lea_WMDE

Event Timeline

Change 319866 had a related patch set uploaded (by WMDE-Fisch):
Extend WikiDiff2 to show changes inside moved paragraphes

https://gerrit.wikimedia.org/r/319866

The case where a paragraph appears to have been moved to several places, and words deleted from the inserted copies, can't be displayed correctly. The workaround which is currently implemented shows only the first inserted paragraph as "moved". Users will rarely, if ever, see this edge case, so it is probably sufficient. Other suggestions to solve this are welcome.

Example diff: https://mwdiffstuff.wmflabs.org/core/index.php?title=Red_imported_fire_ant&diff=26&oldid=25

It turns out that Tool Labs doesn't have the Wikidiff2 extension installed. Mediawiki falls back to some default diff implementation in that case (probably PHP) which produces slightly different output than Wikidiff2. I have set up two Labs instances with *unpatched* and with *patched* wikidiff2, so we can show a side-by-side diff of the diff code with and without our patch, yes that sentence makes sense.

Example: before | after

The wikis pull data from the same MariaDB server, so changes made in one wiki will show up in the other.

Change 356582 had a related patch set uploaded (by WMDE-Fisch; owner: MaxSem):
[mediawiki/php/wikidiff2@master] Better distinction of change vs add/delete actions

https://gerrit.wikimedia.org/r/356582

Change 319866 had a related patch set uploaded (by WMDE-Fisch; owner: WMDE-Fisch):
[mediawiki/php/wikidiff2@master] Extend WikiDiff2 to show changes inside moved paragraphs

https://gerrit.wikimedia.org/r/319866

@MaxSem Thanks for looking into https://gerrit.wikimedia.org/r/#/c/356582/. I updated it. Could you review this and https://gerrit.wikimedia.org/r/#/c/319866/ so we can get it merged?
@tstarling You could also review this, thanks!

@MaxSem updated again :) Anything else you want changed? Or can you give a +1? Thanks

Left my reviews. One thing I'd like is to have a fuzz test of some kind. DiffTests work nice to test diffs on a bunch of real-life changes (and I actually once discovered a segfault with it), however your changes are the biggest thing happening to the code base in a decade. I'll try hacking something up today.

I've created a basic fuzz test at https://gerrit.wikimedia.org/r/#/c/372870/ - if you could add some fuzzing targeted at your new code, I would be more confident about +2ing this project.

@MaxSem I added some code to your fuzz test which hits the other code paths in the patch. It just creates shuffled lines basically.

Tobi_WMDE_SW added a subscriber: jkroll.

Thanks for working on this. I reviewed some of the sample diffs.

This diff should not show either of the paragraphs as moved. It should be a side-by-side plus minus pair, then a minus blank line, then another side-by-side plus minus pair.

This diff should show the Ginger Rogers wurde durch... paragraph as not moved. (As a side-by-side plus minus pair.)

This diff swaps to paragraphs. One should show as moved, the other should be a side-by-side plus minus pair. It would be preferable if the longer paragraph was shown as not-moved.

Hi Alsee, thanks for reporting the diffs, we will look into it!

thiemowmde subscribed.

The example diffs in the comment T146781#3806765 are all because of the same edge case situation. While it would indeed be possible to improve the wikidiff2 code further to make such diffs even better, this is out of the scope of this particular ticket. It's also not a consequence of the code changes being done here. These diffs have been disconnected before. The code introduced here added the arrows and inline diffs to these paragraphs, which is an improvement already. Further improvements on top of that should be tracked in another ticket.