Upstream: <https://secure.phabricator.com/T3353>
Very vaguely related upstream report (focuses on source diffs, not prose diffs): https://secure.phabricator.com/T6791
[[ https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-TASK-upflhdykdp6qdfc/ | Example. ]] The actual change was to replace the wikilink (understood by Bugzilla but not by Phabricator) with an URL, but the darker red/green change markers are all over the place.
For reference, here is the old and new text:
```
From the conversation at [[commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]] it seems that the files downloaded by GWToolset are only deleted when the whole batch upload sequence is finished. Given that the files it uploads tend to be fairly large and one sequence can include thousands or even hundreds of thousands of them, and the files are published on Wikipedia as soon as they are downloaded (so they are not needed anymore once the upload job finishes), this might be suboptimal.
```
```
From the conversation at [[https://commons.wikimedia.org/wiki/User_talk:Faidon_Liambotis_%28WMF%29#GWT_throttling|commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]] it seems that the files downloaded by GWToolset are only deleted when the whole batch upload sequence is finished. Given that the files it uploads tend to be fairly large and one sequence can include thousands or even hundreds of thousands of them, and the files are published on Wikipedia as soon as they are downloaded (so they are not needed anymore once the upload job finishes), this might be suboptimal.
```
A standard diff algorithm such as wdiff does a more decent job:
```
$ wdiff old.txt new.txt
From the conversation at [-[[commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]]-] {+[[https://commons.wikimedia.org/wiki/User_talk:Faidon_Liambotis_%28WMF%29#GWT_throttling|commons:User_talk:Faidon_Liambotis_(WMF)#GWT_throttling]]+} it seems that the files downloaded by GWToolset are only deleted when the whole batch upload sequence is finished. Given that the files it uploads tend to be fairly large and one sequence can include thousands or even hundreds of thousands of them, and the files are published on Wikipedia as soon as they are downloaded (so they are not needed anymore once the upload job finishes), this might be suboptimal.
```