Author: risanecek
Description:
Many dumps are twisted corrupted in many languages.
While syntactically correct, titles do not correspond to content.
e.g. "A mír na Zemi!" in the czech wiki, has the text of "singapore" in the dump. I've discovered this all across the languages - seems not to affect
all articles though. (cswiki dump as of 20100411)
If you need more examples, I can provide them
<page> <title>A mír na Zemi!</title> <id>70749</id> <revision> <id>5178497</id> <timestamp>2010-04-03T22:56:32Z</timestamp> <contributor> <username>Chalupa</username> <id>3656</id> </contributor> <comment>obrázek z commons</comment> <text xml:space="preserve">{{Infobox stát| genitiv = Singapuru | úřední název = Republic of Singapore<br />新加坡共和国<br />Republik Singapura<br />சிங்கப்பூர் குடியரசு | vlajka = Flag of Singapore.svg | článek o vlajce = Singapurská vlajka | znak = | mapa umístění = LocationSingapore.png
...
Version: unspecified
Severity: critical
URL: upload.wikimedia.org/wikipedia/commons/9/95/Image-Dadd|