Page MenuHomePhabricator

Diffs: Implement functionality according to designs
Closed, ResolvedPublic

Description

Feature summary:

When I use the diff viewer I expect to see the actual text differences and I don't want the diff view to be destroyed by formatting symbols. This is one of the most requested features from the Wishlist survey 2022 that lists the communities most pressing issues.

Task:

Implement functionality according to UX mockups provided by designer

Use case(s):

  • when using a line break, the diff viewer falsely interprets it as deleted text and readded text
  • lines aren’t shown while you’re editing, but they do appear on diffs as the counting unit

Benefits:

Make editing and formatting more intuitive and have a diff viewer that helps you understand your changes.

Acceptance criteria:

  • better diff feature looks and behaves as specified in newly designed mockups
  • it's clear what the blue and yellow highlights mean
  • it can be understood what the view before and after means
  • different platforms have same functionalities

Designs:

To be defined

Event Timeline

KSiebert updated the task description. (Show Details)

@HMonroy: Could you please answer the last comment? Thanks in advance!

This seems a complaint that the diff of wiki pages misinterprets some cases as delete and readd instead of noting a new line was added in the middle. _In some cases_, since I think it does properly identifies them sometimes. But with no actual example, it's difficult to determine what's happening. Not that it will necessarily be fixable for all instances, diffing is not a simple task.

There is a second point that "it's [not] clear what the blue and yellow highlights mean" which seems orthogonal to the previous point.

Plus, I suspect some misconception regarding visual editing. Perhaps the visual editor has a bug where it adds empty lines in some cases?

The pages are written in wikitext, and the diff MUST support wikitext diffing. There are features that are not viewable otherwise. Plus, some people don't use VisualEditor at all. (There could be a separate "visual diff", but that would be a completely different feature request)

I don't want the diff view to be destroyed by formatting symbols

I suspect the user (no idea who, a link might be handy) refers to the + and - and linenumbers in a diff.

when using a line break, the diff viewer falsely interprets it as deleted text and readded text

Technically... i'd say it is ignoring the linebreak. The diff algorithm we have, can't handle 'splits' or 'merges' of paragraphs at the character level, as it has already considered them a the line diff level.

lines aren’t shown while you’re editing, but they do appear on diffs as the counting unit

I think the user means line NUMBERS aren't shown while you are editing. But for many code users they are still valuable and do provide a hint of the offset etc. However, I agree that for the vast majority of 'normal' users these are probably pretty useless, and they'd be better off with widgets which add more line of context. (+5 lines, show all lines in between etc)

_In some cases_, since I think it does properly identifies them sometimes. But with no actual example, it's difficult to determine what's happening. Not that it will necessarily be fixable for all instances, diffing is not a simple task.

This is one of those impossible problems. Because of performance reasons, Mediawiki has a two phase diff algorithm. Line based, followed by character based. Depending on the amount of overlap between paragraphs, it can sometimes detect that a line moved, but there have to be limits there, so sometimes it guesses wrong. It's easy for humans to determine 'yes this is the same paragraph, just with a lot of rewrites', but pretty hard to do for computers. Say you split a sentence and move roughly half of it and rewrite half of the remaining half.. Which of these two is now the 'original' paragraph ?
So what you would have to do is first do a high level line diff, then see mutliple 'adjacent' dirty block, backup their line details somewhere in memory, merge the blocks together, run your word diff, split them up again according to the previous line info, yet retain the character diff info....

2017/2018 tech wishlist was the last time an attempt was made to improve this.
https://www.mediawiki.org/wiki/Wikidiff2/How_we_improved_Wikidiff2

There could be a separate "visual diff",

There already is: https://www.mediawiki.org/wiki/VisualEditor/Diffs

@HMonroy: Which "diff viewer" is this about? Standard MediaWiki-Page-diffs?

Hi @Aklapper! My apologies for the delay on this. Commtech is currently working on the project under Better-Diffs-2023 tag, which has to do with improving wikidiff2 PHP extension. This ticket was generated as a placeholder to start thinking about the effort it would take related to accomplish this wish. We are currently in the process of defining the work and starting implementation so you'll notice more issues being submitted with the Better-Diffs-2023 tag.
Regarding the complexity of the algorithm, this ticket is tracking the current work being done around this complicated problem.

Please let me know if you have further questions or feedback.

HMonroy updated the task description. (Show Details)