Page MenuHomePhabricator

Paragraph splits and moves not identified in diffs
Open, LowPublic

Tokens
"Love" token, awarded by Liuxinyu970226."Mountain of Wealth" token, awarded by Gryllida."Manufacturing Defect?" token, awarded by Zazpot."Love" token, awarded by Gestrid."Love" token, awarded by Nux."Love" token, awarded by Agent007bond.
Assigned To
None
Authored By
bzimport, Feb 24 2006

Description

Author: circeus

Description:
When you break a paragraph, the entire text is marked as deleted and added,
creating an annoying and needless big chunk of red text, while only the added line
breaks should really be indicated.


Version: unspecified
Severity: enhancement/bug
See also: T15462: Enhance line matching in diffs

Details

Reference
bz5072

Event Timeline

bzimport raised the priority of this task from to Low.
bzimport set Reference to bz5072.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Feb 24 2006, 2:05 AM

scmcc wrote:

I'm adding to this bug, since it seems to be part of a larger problem: the article
history diffs fails to track paragraph moves as well as splits. Since diffs are
intended to help editors track changes, this failure represents a minor loss of
function so I'm upgrading the severity from trivial to minor.

scmcc wrote:

Further clarification:

The essence of the problem is history/diffs loses track of moved paragraphs, and hence
does not compare within the moved paragraph to identify changes that may have been made
in the same edit as the move.

When a new paragraph is inserted, history/diffs often compares the inserted paragraph
with an old following paragraph, rather than comparing the old paragraph with the
(often unchanged) version that is now in a new location.

What we seem to need is a robust difference detector that can track moves of paragraphs
or even lines, and then identify smaller changes in those moved segments. They must be
around, they've been in word processors for years, and in Wiki's editing intensive
environment they're long overdue.

Thanks, Steve

scmcc wrote:

It looks like there's a solution to this problem by installing User:Cacycle/wikEdDiff. It tracks changes through paragraph breaks and (apparently) catches moved sections of text as well. I wouldn't quite call this bug "fixed" until this is made a standard part of the Wikipedia difference display, but having it available is a big step in the right direction.

Steve

wikiEdDiff is available as a gadget now. [[User:Cacycle/wikEd_help#wikEd_control_buttons]]

Think this is closed.

(In reply to comment #3)

I wouldn't quite call
this bug "fixed" until this is made a standard part of the Wikipedia difference
display, but having it available is a big step in the right direction.

I agree on this.

(In reply to comment #5)

(In reply to comment #3)
> I wouldn't quite call
> this bug "fixed" until this is made a standard part of the Wikipedia difference
> display, but having it available is a big step in the right direction.

I agree on this.

Should we make it possible to replace the built-in server-side wikidiff tool with this client side one? Or maybe encourage the developer to have his JS output replace server-side generated output?

The wikiEdDiff author seems interested in getting a PHP-based solution to replace this JS-based one, but I'm not sure if that would be better (as far as server impact, which is what Wikimedia ops would be concerned with) than the C++-based one.

Reopening per comment #3 and comment #5 (and I also agree).

And replying to comment #6: From the user perspective, I think either option would be an improvement to the current situation. Should the ops decide that a server-side implementation is unfeasible (without performance degradation), then the client-side gadget is a reasonable (but not as good) replacement. In any case, the client-side gadget would need to be bundled with MediaWiki, so that this bug can IMO be considered fixed.

There's already some code which could be used: see for instance this tool (screenshot at p. 8): http://www.fst.umac.mo/en/staff/documents/robertb/WikiSym2010-PeterRobert-Final.pdf

Code is published in http://sourceforge.net/projects/weha/ , I was told in August 2010 that they were going to release it as an extension once ready but perhaps someone could already work on it.

Agent007bond updated the task description. (Show Details)
Agent007bond set Security to None.
Nux awarded a token.Jan 27 2017, 10:49 PM
Gestrid added a subscriber: Gestrid.
Zazpot added a subscriber: Zazpot.May 29 2017, 6:20 AM

As discussed here, another example of this bug is at https://en.wikipedia.org/w/index.php?title=MalwareTech&diff=782456060&oldid=782449642 . In that diff, both the left and right columns have hunks that begin "Following his work on the WannaCry ransomware attack in 2017" and that are almost identical (edit distance: 3) but that have been aligned with other hunks instead of with each other, making it very hard to spot what has changed between them. (To spare you searching, it is "he's" to "he has".)

I expect the solution to this bug will involve matching paragraphs according to minimum edit distance, with a fallback algorithm in case two or more paragraphs are equal edit distances away.

Zazpot added a comment.EditedMay 29 2017, 6:23 AM

Several people above have suggested that a client-side solution would be acceptable. I disagree.

The diff tool is crucial for checking edits for vandalism, etc, and must be usable by all editors. Not all editors enable JavaScript. Therefore, a client-side-only solution would be inadequate.

This bug really needs fixing on the server side :)

Nemo_bis updated the task description. (Show Details)May 29 2017, 8:30 AM
Gryllida rescinded a token.
Gryllida awarded a token.
Liuxinyu970226 rescinded a token.
Liuxinyu970226 awarded a token.