Author: zhangxiaoquan
Description:
In an attempt to study the incentives to contribute to wikipedia, Feng Zhu from
Harvard Business School and I (MIT Sloan School of Management) wanted to examine
the modification history of the wikipedia entries. We downloaded the following
data dump file:
http://download.wikipedia.com/enwiki/20060518/enwiki-20060518-pages-meta-history.xml.bz2
and found that it contains CRC errors in it. We then followed the link to
download a few previous versions of the file, but they all had problems.
Here is the error message returned by bzip2recover:
- error message ----
bzip2 -t enwiki-20060518-pages-meta-history.xml.bz2
bzip2: enwiki-20060518-pages-meta-history.xml.bz2: data integrity (CRC) error in
data
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
- end of error message ----
Version: unspecified
Severity: major
URL: http://download.wikipedia.com/enwiki/20060518/