Diffs: Incorrect number of bytes added or removed
OpenPublic

Description

The number of bytes added in several diffs are wrong. The bug applies to page histories and user contributions, but does not seem to have any effect on Recent changes.

Here is one example:
MerlIWBot edits Template:Link GA on wikidata-test-client by adding one interlanguage link. In his contributions and in the page history the edit is stated as being 1370 bytes. Those 1370 bytes are actually the size of the template itself, but not the size of the edit. The RecentChanges state that the edit is 30 bytes, which is much more believable.

From /Special:Contributions/MerlIwBot:
18:51, 12 May 2012 (diff | hist) . . (+1,370)‎ . . Nm Template:Link GA ‎ (Robot: Adding el:Πρότυπο:Link GA) (top)

From index.php?title=Template:Link GA&action=history
(cur | prev) 18:51, 12 May 2012‎ MerlIwBot (Talk)‎ m . . (1,370 bytes) (+1,370)‎ . . (Robot: Adding el:Πρότυπο:Link GA) (undo)

From RecentChanges:
12 May 2012
(diff | hist) . . m Template:Link GA‎; 18:51 . . (+30)‎ . . ‎MerlIwBot (Talk)‎ (Robot: Adding el:Πρότυπο:Link GA)

I can name several other examples aswell, but I am going to stick to this one for now.


Version: unspecified
Severity: normal

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz36976.
Snaevar created this task.Via LegacyMay 19 2012, 5:04 PM
Denny added a comment.Via ConduitMay 20 2012, 3:18 AM

Interesting error. My assumption is that this is due to the fact that we first imported only the most recent version of the articles, and then we changed our mind and imported their full histories, and so basically all the diff sizes use the wrong basis when they were calculated. The rebuildall scrip gets stuck for some reason on the simple.wp import of the elements, but we didn't investigate that further.

If this theory is confirmed, this would be a bug for the import or rebuild scripts, and does not have to do anything with the Wikidata extension, and would need to be recategorized. Thanks for the catch!

MarkAHershberger added a comment.Via ConduitMay 24 2012, 5:56 PM

(In reply to comment #1)

Interesting error. My assumption is that this is due to the fact that we first
imported only the most recent version of the articles, and then we changed our
mind and imported their full histories, and so basically all the diff sizes use
the wrong basis when they were calculated. The rebuildall scrip gets stuck for
some reason on the simple.wp import of the elements, but we didn't investigate
that further.

If this theory is confirmed, this would be a bug for the import or rebuild
scripts,

re-categorizing

bzimport added a comment.Via ConduitMay 27 2012, 12:53 PM

Thehelpfulonewiki wrote:

Marking as new.

Snaevar added a comment.Via ConduitMay 27 2012, 1:51 PM

Adding a few links to the example I provided, for easy access.

User Contributions of MerlIwBot (when this was written the diff in question is the newest one): http://wikidata-test-client.wikimedia.de/wiki/Special:Contributions/MerlIwBot

Revision history of Template:LinkGA (when this was written the diff in question is the newest one): http://wikidata-test-client.wikimedia.de/w/index.php?title=Template:Link_GA&action=history

The diff itself: http://wikidata-test-client.wikimedia.de/w/index.php?title=Template%3ALink_GA&diff=2677&oldid=15118

Recent changes (the diff is under "12 may 2012" at 18:51): http://wikidata-test-client.wikimedia.de/w/index.php?title=Special:RecentChanges&days=30&from=&limit=250

daniel added a comment.Via ConduitNov 15 2012, 4:34 PM

removign this from the wikidata-bugs, since it'S not related to wikidata or wikibase.

leucosticte added a comment.Via ConduitOct 25 2013, 9:00 AM

I am noticing an issue now in which if I go to, say, https://www.mediawiki.org/wiki/Special:Contributions/Leucosticte , it shows me in parentheses the total length of the page rather than the number of bytes added. E.g., if I increased the page length to 9,531, it says (+9,531). It's also doing this on my other wiki which is running MW 1.22alpha.

He7d3r added a comment.Via ConduitOct 25 2013, 9:16 AM

(In reply to comment #7)

Confirmed.

PleaseStand added a comment.Via ConduitOct 25 2013, 9:40 AM

(In reply to comment #7)

I am noticing an issue now in which if I go to, say,
https://www.mediawiki.org/wiki/Special:Contributions/Leucosticte , it shows
me
in parentheses the total length of the page rather than the number of bytes
added. E.g., if I increased the page length to 9,531, it says (+9,531). It's
also doing this on my other wiki which is running MW 1.22alpha.

I think you have encountered a different bug (bug 56115).

TTO removed a subscriber: wikibugs-l-list.Via WebOct 11 2015, 6:41 AM
TTO set Security to None.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptVia HeraldOct 11 2015, 6:41 AM
TTO added a subscriber: TTO.Via WebOct 11 2015, 6:41 AM
TTO added a comment.Via WebOct 11 2015, 6:45 AM

The incorrect bytes added or bytes removed shouldn't happen anymore now that T114806 is fixed, but we need some kind of maintenance script to clean up old instances of this bug.

It would need to be able to run on enwiki, so I have no idea how to do it in a sane fashion. Perhaps get the script to look through the import log and recalculate bytes added/removed on all revisions of imported pages - is that sane?

Graham87 added a comment.Via WebOct 11 2015, 11:28 AM

Not all of these problems are caused by imports; some are caused by out-of-order revision ID's, like this example:
https://en.wikipedia.org/w/index.php?title=Talk:Netherlands&dir=prev&action=history

The most complete way to fix this problem would be to write a maintenance script that checks if the timestamp of each revision's rev_parent_id is later, rather than earlier, than the date of each revision, and if it is, adjust it so it refers to the most recent revision by timestamp, rather than revision ID number.

TTO added a comment.Via WebOct 11 2015, 11:35 AM

How did those revision IDs get out of order? Was it a history merge? In that case, a separate task should be filed.

I suppose undeletions can also create havoc with size diffs...

Graham87 added a comment.Via WebOct 11 2015, 11:47 AM

They became out-of-order because they were deleted before Wikipedia was upgraded to MediaWiki 1.5in June 2005. As a result, they lost their original revision ID numbers because they weren't saved when a revision was deleted back in those days. The bug summary isn't about imports ... it's about this entire situation in general, so I don't see the point of creating a new task for situations like the one I've just described (but you can if you want).

Add Comment