The current latest XML English-language Wikipedia dump
enwiki-latest-pages-articles.xml.bz2 (05-Jun-2015 23:45, 11984805689 bytes)
has several duplicated pages.
This leads to errors while trying to read its data and populate a MYSQL database.
For example, I get
Exception in thread "main" java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '614219339' for key 'PRIMARY'
at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
The list of all the IDs that are duplicated is:
- 614219339
- 663854862
- 359952698
- 301899471
- 559375953
- 603392565
- 544004224
- 624437388
- 37733084
I checked some of these duplicated entries and I saw that the XML content from <page> to </page> is identical.