Page MenuHomePhabricator

Null edit actually caused carriage returns to be removed from the page
Closed, DeclinedPublic

Description

This diff was supposed to be a null edit but It actually got modified and lost ~1400 bytes (see here). Also note that the diff says "no difference".

The number of bytes coincide with the number of lines in the module so I am pretty sure it is \n\r and \r thing.

I would probably have let it slide if I had not thought it interfered with MW's normal, periodic page nulledits (I hope that kind of thing exists at all). The page I linked above is one of many pages that should have moved from one to another category due to a module modification I did before. Other pages that had unix style newlines did move automatically while this one did not (within a ~12-hour period), possibly because automatic null-edit was rejected as the system assumed it was going to actually change something(?).

Event Timeline

Interesting, that does indeed appear to be what happened (28931768 is the rev_text_id of the old revision and 41348275 of the new):

krenair@terbium:~$ echo 28931768 | mwscript fetchText.php enwiktionary > oldRev
krenair@terbium:~$ echo 41348275 | mwscript fetchText.php enwiktionary > newRev
krenair@terbium:~$ wc -c oldRev newRev
41995 oldRev
40612 newRev
82607 total
krenair@terbium:~$ cat oldRev | tr -d '\r' | wc -c
40612
krenair@terbium:~$ cat oldRev | tr -d '\r' > oldRevTransformed
krenair@terbium:~$ diff oldRevTransformed newRev
1,2c1,2
< 28931768
< 41980
---
> 41348275
> 40597
krenair@terbium:~$

(the difference at the end just shows that the two numbers fetchText.php prints before each revision - text ID and length - differ as you'd expect, but that the texts are otherwise the same after the removal of \r from the old revision)
I guess the diff code ignores CR characters somewhere.

Krenair renamed this task from Null edit turned into a regular edit to Null edit actually caused carriage returns to be removed from the page.Nov 13 2016, 2:25 AM
Krenair added a project: Scribunto.

This page is in the Scribunto content model so I'm adding that extension to the task

I think this behavior makes sense, as newlines should always be normalized to '\n'. A better question is how the carriage returns made it into that page in the first place… 

Anomie subscribed.

I would probably have let it slide if I had not thought it interfered with MW's normal, periodic page nulledits (I hope that kind of thing exists at all).

MediaWiki does not perform periodic null edits. It does sometimes reparse a page to update links tables, roughly the same effect as can be achieved by using the API action=purge&forcelinkupdate, when this is triggered by a dependent page changing.

I think this behavior makes sense, as newlines should always be normalized to '\n'. A better question is how the carriage returns made it into that page in the first place… 

Prior to the fix for T142805: Edit API does not always normalize line endings, there were some code paths that were bypassing the normal newline normalization. That's now fixed, so any subsequent edit to pages that had non-normalized newlines thanks to that issue would also normalize the newlines.