Page MenuHomePhabricator

Phabricator task description diffs inaccurate due to 80-character line wrapping
Closed, ResolvedPublic

Description

Upstream: https://secure.phabricator.com/T3353
Very vaguely related upstream report (focuses on source diffs, not prose diffs): https://secure.phabricator.com/T6791
Related: https://secure.phabricator.com/T7643

Example. The actual change was to replace the wikilink (understood by Bugzilla but not by Phabricator) with an URL, but the darker red/green change markers are all over the place.

For reference, here is the old and new text:

From the conversation at [[commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]] it seems that the files downloaded by GWToolset are only deleted when the whole batch upload sequence is finished. Given that the files it uploads tend to be fairly large and one sequence can include thousands or even hundreds of thousands of them, and the files are published on Wikipedia as soon as they are downloaded (so they are not needed anymore once the upload job finishes), this might be suboptimal.
From the conversation at [[https://commons.wikimedia.org/wiki/User_talk:Faidon_Liambotis_%28WMF%29#GWT_throttling|commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]] it seems that the files downloaded by GWToolset are only deleted when the whole batch upload sequence is finished. Given that the files it uploads tend to be fairly large and one sequence can include thousands or even hundreds of thousands of them, and the files are published on Wikipedia as soon as they are downloaded (so they are not needed anymore once the upload job finishes), this might be suboptimal.

A standard diff algorithm such as wdiff does a more decent job:

$ wdiff old.txt new.txt
From the conversation at [-[[commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]]-] {+[[https://commons.wikimedia.org/wiki/User_talk:Faidon_Liambotis_%28WMF%29#GWT_throttling|commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]]+} it seems that the files downloaded by GWToolset are only deleted when the whole batch upload sequence is finished. Given that the files it uploads tend to be fairly large and one sequence can include thousands or even hundreds of thousands of them, and the files are published on Wikipedia as soon as they are downloaded (so they are not needed anymore once the upload job finishes), this might be suboptimal.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added a project: Phabricator.
Tgr changed Security from none to None.
Tgr subscribed.

From this I can't really tell what is "weird" about them?

See the link at the beginning. The extent of change is completely misidentified. This is how it looks:

phabricator_diff_bad.png (272×1 px, 75 KB)

This is how it should look:

phabricator_diff_good.png (273×1 px, 71 KB)

This is how it looks with wdiff:

phabricator_diff_wdiff.png (94×1 px, 42 KB)

Qgil triaged this task as Lowest priority.Dec 18 2014, 1:38 PM
Qgil edited projects, added Phabricator (Upstream); removed Phabricator.

Can https://secure.phabricator.com/T6791 (just reported today) serve as related upstream task?

We use diff, git diff, etc. We do not implement our own diff algorithm.

In T78824#937047, @Qgil wrote:

Can https://secure.phabricator.com/T6791 (just reported today) serve as related upstream task?

Related but not the same. That ticket is unclear (could use an example diff and description of what the reporter would have expected) but seems to be about detecting which lines changed. This ticket is about inline markup - the diff display annotates correctly which lines changed (light green/red) but completely messes up which characters changed (dark green/red). Also...

We use diff, git diff, etc. We do not implement our own diff algorithm.

the standard command line tool for inline diffs is wdiff, which gives clean diffs. Either something else is used, or something is wrong with the way it is invoked.

Krinkle renamed this task from Phabricator inline diffs are weird to Phabricator inline inaccurate due to 80-character line wrapping.Mar 27 2015, 12:37 PM
Krinkle renamed this task from Phabricator inline inaccurate due to 80-character line wrapping to Phabricator task description diffs inaccurate due to 80-character line wrapping.

Josve05a on #wikimedia-tech:

andre__: this diff is borked.... https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-TASK-qvlys6ukn6tma3i/ I only edited "Whn" > "When" and added "additional". THe diff breaks words... bug in phab?

Upstream have changed the way diffs are viewed and it is defiantly an improvement, makes it more readable.

Upstream has split https://secure.phabricator.com/T3353 into three more specific tasks and initial work has taken place. Quoting upstream comment:

mmodell claimed this task.