Page MenuHomePhabricator

Diff shows unchanged parts as if they differ
Closed, InvalidPublic

Description

it is difficult to explain, but please look at
[[w:he:Sepcial:Diff/16519168]], or direct link.

the diff shows 3 hunks, but only the 3rd one is real - the first two hunks mark unchanged content.this can be verified in a few different ways, including "wikeddif", and simple counting and looking at the "bytes-diff" in page history (remember that a hebrew UTF character occupies 2 bytes).

the change was made using hotcat, and touches the "category" only, as can be seen in the last hunk of the diff. the first two hunks mark parts of the page that were not changed.

i did not dig into the "diff" code, but it may help to mention that both spurious diff marks appear at:
.
{{

i.e., period, newline, open braces,.open braces

peace.

Event Timeline

Kipod raised the priority of this task from to Needs Triage.
Kipod updated the task description. (Show Details)
Kipod added a project: MediaWiki-Page-diffs.
Kipod subscribed.
Kipod set Security to None.

The diff is correct, there are differencies.

It's better seen from the API output, comparing both revision texts:

https://he.wikipedia.org/w/api.php?action=query&prop=revisions&revids=16387107|16519168&rvprop=timestamp|user|comment|content&uselang=en

Search for this sequence: \u05e0\u05e7\u05d9\u05d9\u05d4]]

You'll see that following that sequence, the first revision is followed by a "\r" character, while the second is followed by a "\n" character.

Even with that, I though MediaWiki already did some sort of newline sanitizer, converting all variants of newline characters (\r, \n or \n\r) to the same line ending to prevent such spurious diffs due to editing from different systems (Windows, Unix, Mac)

thanks.

i tried to look at the raw characters (using "unicode analyzer" extension of the browser, but i guess that when the result was presented, the characters were already converted. i did not think of getting the "raw" content using the API.

<s>the \r may be very old, before the sanitization you mention, which may explain it. i thought \r hints to someone editing with old macos (before osX) - don't know of any other system using <CR> only for newline.</s>

<addition>
strike the previous paragraph. it turns out this came from a stray bot.
</addition>

thanks again,
peace.

Note that both edits are from 2015, so they couldn't be from before that sanitization (which I don't really know if it actually happens, but it should)

The bug as exposed is invalid. However, someone who know the internals of MediaWIki editing should check if MediaWiki does some sort of line ending normalization. If it does, there's a bug somewhere that it's not normalizing them in this case. If not, I'd like to open a feature request for MediaWiki to normalize them

found the culprit - it was a stray bot. thanks for helping to explain the problem. i'll convey this to the bot's operator.

closing as "invalid".

peace.

From my understanding, the normalization should happen on the server, but the bot is being run from a client...

i guess that normalization happens for normal editing ("submit" from the web form) but not for API:Edit calls.

i do not know it for a fact, but if this is the case, i don't think it should be considered a bug.

peace.