Show changes in moved text chunks (C++)
Open, Needs TriagePublic

Description

Motivation:
Currently of diff pages, moved paragraphs are considered a deletion and a subsequent addition. This leads to the situation that people always need to read through all moved paragraphs to make sure, that nothing has changed apart from the position.

Task:
Using the currently existing display methods, show changes in moved text chunks. This means, that only the part that changed is highlighted.

Important:
The diff engine used on mediawiki production wikis is wikidiff2. This is a c++ library, so the code needs to be written in C++.

Background:
This is part of fulfilling wish #2 of the 2015 German-speaking community wishlist survey: T139603: Show text changes when moving text chunks (#2)

Related Objects

StatusAssignedTask
ResolvedDannyH
OpenNone
OpenNone
ResolvedWMDE-Fisch
ResolvedTobi_WMDE_SW
Resolvedjkroll
ResolvedTobi_WMDE_SW
Resolvedjkroll
OpenNone
ResolvedPaladox
ResolvedAddshore
ResolvedAddshore
ResolvedTobi_WMDE_SW
ResolvedWMDE-Fisch
Resolvedjkroll
ResolvedLegoktm
Resolvedjkroll
OpenNone
Resolvedjkroll
ResolvedAndrew-WMDE
Resolvedjkroll
ResolvedLea_WMDE
ResolvedNone
Resolvedjkroll
ResolvedWMDE-Fisch
Resolvedjkroll
Resolvedjkroll
ResolvedWMDE-Fisch
Resolvedjkroll
Resolvedovasileva
Resolvedjkroll
ResolvedWMDE-Fisch
ResolvedMoritzMuehlenhoff
OpenWMDE-Fisch
OpenNone
ResolvedLea_WMDE
Lea_WMDE created this task.Sep 27 2016, 4:30 PM
Restricted Application added a project: TCB-Team. · View Herald TranscriptSep 27 2016, 4:30 PM

Change 319866 had a related patch set uploaded (by WMDE-Fisch):
Extend WikiDiff2 to show changes inside moved paragraphes

https://gerrit.wikimedia.org/r/319866

The case where a paragraph appears to have been moved to several places, and words deleted from the inserted copies, can't be displayed correctly. The workaround which is currently implemented shows only the first inserted paragraph as "moved". Users will rarely, if ever, see this edge case, so it is probably sufficient. Other suggestions to solve this are welcome.

Example diff: https://mwdiffstuff.wmflabs.org/core/index.php?title=Red_imported_fire_ant&diff=26&oldid=25

It turns out that Tool Labs doesn't have the Wikidiff2 extension installed. Mediawiki falls back to some default diff implementation in that case (probably PHP) which produces slightly different output than Wikidiff2. I have set up two Labs instances with *unpatched* and with *patched* wikidiff2, so we can show a side-by-side diff of the diff code with and without our patch, yes that sentence makes sense.

Example: before | after

The wikis pull data from the same MariaDB server, so changes made in one wiki will show up in the other.

Change 356582 had a related patch set uploaded (by WMDE-Fisch; owner: MaxSem):
[mediawiki/php/wikidiff2@master] Better distinction of change vs add/delete actions

https://gerrit.wikimedia.org/r/356582

Change 319866 had a related patch set uploaded (by WMDE-Fisch; owner: WMDE-Fisch):
[mediawiki/php/wikidiff2@master] Extend WikiDiff2 to show changes inside moved paragraphs

https://gerrit.wikimedia.org/r/319866

@MaxSem Thanks for looking into https://gerrit.wikimedia.org/r/#/c/356582/. I updated it. Could you review this and https://gerrit.wikimedia.org/r/#/c/319866/ so we can get it merged?
@tstarling You could also review this, thanks!

@MaxSem updated again :) Anything else you want changed? Or can you give a +1? Thanks

MaxSem added a comment.Aug 9 2017, 5:59 PM

Left my reviews. One thing I'd like is to have a fuzz test of some kind. DiffTests work nice to test diffs on a bunch of real-life changes (and I actually once discovered a segfault with it), however your changes are the biggest thing happening to the code base in a decade. I'll try hacking something up today.

I've created a basic fuzz test at https://gerrit.wikimedia.org/r/#/c/372870/ - if you could add some fuzzing targeted at your new code, I would be more confident about +2ing this project.

jkroll added a comment.Sep 5 2017, 1:16 PM

@MaxSem I added some code to your fuzz test which hits the other code paths in the patch. It just creates shuffled lines basically.

Tobi_WMDE_SW moved this task from Done to Proposed on the WMDE-QWERTY-Team board.Oct 17 2017, 1:20 PM
Tobi_WMDE_SW removed jkroll as the assignee of this task.Nov 1 2017, 7:49 AM
Tobi_WMDE_SW added a subscriber: jkroll.
Lea_WMDE updated the task description. (Show Details)Nov 28 2017, 11:20 AM
Alsee added a subscriber: Alsee.Dec 3 2017, 11:11 AM

Thanks for working on this. I reviewed some of the sample diffs.

This diff should not show either of the paragraphs as moved. It should be a side-by-side plus minus pair, then a minus blank line, then another side-by-side plus minus pair.

This diff should show the Ginger Rogers wurde durch... paragraph as not moved. (As a side-by-side plus minus pair.)

This diff swaps to paragraphs. One should show as moved, the other should be a side-by-side plus minus pair. It would be preferable if the longer paragraph was shown as not-moved.

Hi Alsee, thanks for reporting the diffs, we will look into it!