text of revisions in the archive table that were deleted before Wikipedia started using MediaWiki 1.5 is corrupt
Closed, DeclinedPublic

Description

I was checking through deleted revisions in the main namespace by Conversion script on the English Wikipedia, to find old deleted edits to history merge:
http://en.wikipedia.org/w/index.php?limit=500&title=Special%3ADeletedContributions&target=Conversion+script&namespace=0

I found that in all pages deleted before Wikipedia was upgraded to MediaWiki 1.5 (late June 2005), all edits besides the latest one are corrupt. An undeleted example of these edits can be found above; the edits were previously at the title "Clearwater River, Idaho", and I history merged them to the existing article "Clearwater River(Idaho)". Another example involves the page about Michael Collins:
http://en.wikipedia.org/w/index.php?title=Michael_Collins&dir=prev&limit=6&action=history

The edits were previously at the title "Michael Collins (disambiguation)".

Even though 99.9% of the text in these old deleted archives is garbage, the other 0.1% is very important page history and it should not be corrupted.


Version: unspecified
Severity: major
URL: http://en.wikipedia.org/w/index.php?title=Clearwater_River_(Idaho)&dir=prev&limit=16&action=history

bzimport set Reference to bz19990.
Graham87 created this task.Via LegacyJul 29 2009, 3:12 PM
brion added a comment.Via ConduitJul 29 2009, 3:20 PM

Possible external storage issue? Looks like something not getting un-gzipped or losing its flags.

Graham87 added a comment.Via ConduitJul 29 2009, 3:46 PM

I'm not sure if this is related, but some revisions before June 2005 are completely blank when they shouldn't be, as reported at this discussion on the technical village pump:

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_62#Revision content disappeared

I didn't think much of it at the time, but both problems seem to involve Wikipedia text added before the upgrade to MediaWiki 1.5.

Graham87 added a comment.Via ConduitAug 8 2009, 10:00 AM

These deleted revisions from before June 2005 are fine:
http://en.wikipedia.org/wiki/Special:Undelete/Braille_music

They should stay deleted, since they were obviously nuked to make way for a page move.

IAlex added a comment.Via ConduitAug 28 2009, 7:26 AM

This should be fixed in r55626.

Graham87 added a comment.Via ConduitAug 28 2009, 8:23 AM

It's fixed in the archive table where the MW 1.4 deleted revisions are.

However the undeleted edits to "Clearwater River (Idaho)" and "Michael Collins" that I mentioned above are still corrupt. I tried deleting and undeleting them, just in case, and that didn't fix the issue. I highly doubt there are many other revisions with this problem.

I'm not sure of proper protocol here : whether to re-open this bug, or start a new one ...

tstarling added a comment.Via ConduitAug 28 2009, 1:43 PM

(In reply to comment #6)

It's fixed in the archive table where the MW 1.4 deleted revisions are.

However the undeleted edits to "Clearwater River (Idaho)" and "Michael Collins"
that I mentioned above are still corrupt. I tried deleting and undeleting them,
just in case, and that didn't fix the issue. I highly doubt there are many
other revisions with this problem.

I'm not sure of proper protocol here : whether to re-open this bug, or start a
new one ...

Anything that was undeleted while the bug was active will now be permanently corrupted and will need to fixed manually.

Graham87 added a comment.Via ConduitAug 28 2009, 3:17 PM

Yikes, I thought as much. So ... what happens with this bug? The underlying issue is resolved but it's still caused damage that's seemingly hard to fix.

IAlex added a comment.Via ConduitAug 28 2009, 3:36 PM

The only way to fix it is to update each corrupted row in the database, e.g. by adding manually "gzip" in the old_flags field. The problem is that it'd be very difficult to find the affected revisions automatically.

Graham87 added a comment.Via ConduitAug 29 2009, 6:35 AM

Then I'd like someone to fix the revisions I mentioned above:
http://en.wikipedia.org/w/index.php?title=Clearwater_River_(Idaho)&dir=prev&limit=16&action=history

and:
http://en.wikipedia.org/w/index.php?title=Michael_Collins&dir=prev&limit=6&action=history

As for finding other cases where it happened, for the English Wikipedia, check whether the revision ID is greater than 296,365,718 and the revision date is before July 2005, so when MW 1.4 was used. I use a revision ID of 296365718 because it's the last uncorrupted revision that I know of which was deleted that could've had this problem, see this diff:
http://en.wikipedia.org/w/index.php?title=User:Xaonon&diff=2406956&oldid=296365718

As far as I know, this would work because before MW 1.5 was used, a revision got a new rev_id when it was undeleted.

drdee added a comment.Via ConduitNov 29 2011, 10:01 PM

Tim, do you think this is something that still can and should be recovered or just close as WONTFIX?

Aklapper added a comment.Via ConduitDec 19 2012, 2:17 PM

Realistically closing this as WONTFIX nowadays.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.