Show changes in moved text chunks (C++)
Open, Needs TriagePublic

Description

Motivation:
Currently of diff pages, moved paragraphs are considered a deletion and a subsequent addition. This leads to the situation that people always need to read through all moved paragraphs to make sure, that nothing has changed apart from the position.

Task:
Using the currently existing display methods, show changes in moved text chunks. This means, that only the part that changed is highlighted.

Important:
Diff engine changes need to be implemented both for php and c++. This ticket covers the C++ part

Background:
This is part of fulfilling wish #2 of the 2015 German-speaking community wishlist survey: T139603: Show text changes when moving text chunks (#2)

Restricted Application added a project: TCB-Team. · View Herald TranscriptSep 27 2016, 4:30 PM

Change 319866 had a related patch set uploaded (by WMDE-Fisch):
Extend WikiDiff2 to show changes inside moved paragraphes

https://gerrit.wikimedia.org/r/319866

The case where a paragraph appears to have been moved to several places, and words deleted from the inserted copies, can't be displayed correctly. The workaround which is currently implemented shows only the first inserted paragraph as "moved". Users will rarely, if ever, see this edge case, so it is probably sufficient. Other suggestions to solve this are welcome.

Example diff: https://mwdiffstuff.wmflabs.org/core/index.php?title=Red_imported_fire_ant&diff=26&oldid=25

It turns out that Tool Labs doesn't have the Wikidiff2 extension installed. Mediawiki falls back to some default diff implementation in that case (probably PHP) which produces slightly different output than Wikidiff2. I have set up two Labs instances with *unpatched* and with *patched* wikidiff2, so we can show a side-by-side diff of the diff code with and without our patch, yes that sentence makes sense.

Example: before | after

The wikis pull data from the same MariaDB server, so changes made in one wiki will show up in the other.

Change 356582 had a related patch set uploaded (by WMDE-Fisch; owner: MaxSem):
[mediawiki/php/wikidiff2@master] Better distinction of change vs add/delete actions

https://gerrit.wikimedia.org/r/356582

Change 319866 had a related patch set uploaded (by WMDE-Fisch; owner: WMDE-Fisch):
[mediawiki/php/wikidiff2@master] Extend WikiDiff2 to show changes inside moved paragraphs

https://gerrit.wikimedia.org/r/319866

@MaxSem Thanks for looking into https://gerrit.wikimedia.org/r/#/c/356582/. I updated it. Could you review this and https://gerrit.wikimedia.org/r/#/c/319866/ so we can get it merged?
@tstarling You could also review this, thanks!

@MaxSem updated again :) Anything else you want changed? Or can you give a +1? Thanks

MaxSem added a comment.Aug 9 2017, 5:59 PM

Left my reviews. One thing I'd like is to have a fuzz test of some kind. DiffTests work nice to test diffs on a bunch of real-life changes (and I actually once discovered a segfault with it), however your changes are the biggest thing happening to the code base in a decade. I'll try hacking something up today.

I've created a basic fuzz test at https://gerrit.wikimedia.org/r/#/c/372870/ - if you could add some fuzzing targeted at your new code, I would be more confident about +2ing this project.

jkroll added a comment.Sep 5 2017, 1:16 PM

@MaxSem I added some code to your fuzz test which hits the other code paths in the patch. It just creates shuffled lines basically.