Page MenuHomePhabricator

dumps - enwiki XML dump violates UNIQUE KEY constraint
Closed, ResolvedPublic

Description

0) Problem

The enwiki-20150602-stub-meta-current.xml.gz violates UNIQUE KEY constraint.
This makes trouble for INSERT commands.

  1. Database schema

CREATE TABLE page (
page_id int(10) unsigned NOT NULL AUTO_INCREMENT,
page_namespace int(11) NOT NULL,
page_title varbinary(255) NOT NULL,
...
PRIMARY KEY (page_id),
UNIQUE KEY name_title (page_namespace,page_title), <-- this constraint
KEY page_random (page_random),
KEY page_len (page_len),
KEY page_redirect_namespace_len (page_is_redirect,page_namespace,page_len),
KEY page_latest (page_latest)
) ENGINE=InnoDB CHARSET=binary;

  1. Dump contents

There are two entries for <title>Lauren Price</title><ns>0</ns>.

<page>
<title>Lauren Price</title>
<ns>0</ns>
<id>41813333</id>
<revision>
<id>665226532</id>
<parentid>665226528</parentid>
<timestamp>2015-06-02T20:20:52Z</timestamp>
<contributor>
<username>ClueBot NG</username>
<id>13286072</id>
</contributor>
<minor/>
<comment>Reverting possible vandalism by [[Special:Contributions/82.9.60.232|82.9.60.232]] to version by Tymon.r. False positive? [[User:ClueBot NG/FalsePositives|Report it]]. Thanks, [[User:ClueBot NG|ClueBot NG]]. (2261137) (Bot)</comment>
<model>wikitext</model>
<format>text/x-wiki</format>
<text id="670477152" bytes="273" />
<sha1>hlatpj1bvsq2zzodwl9iyp6xv7g8k8s</sha1>
</revision>
</page>
...
<page>
<title>Lauren Price</title>
<ns>0</ns>
<id>44184271</id>
<revision>
<id>665284753</id>
<parentid>665284424</parentid>
<timestamp>2015-06-03T06:04:16Z</timestamp>
<contributor>
<username>Boleyn</username>
<id>6127189</id>
</contributor>
<comment>Disambiguation page should not have just been overwritten like this</comment>
<model>wikitext</model>
<format>text/x-wiki</format>
<text id="670537500" bytes="2450" />
<sha1>rean3vigkkfhzwlxvyqp9tov3de8bjh</sha1>
</revision>
</page>

  1. Recommendation

Action: Please delete the first of these (page_id=41813333).
Reason: The text (old_id=670477152) looks like a disambiguation page. Yet there already exists a similar disambiguation page <title>Lauren Price (disambiguation)</title><ns>0</ns>.

Event Timeline

wpmirrordev assigned this task to ArielGlenn.
wpmirrordev raised the priority of this task from to Normal.
wpmirrordev updated the task description. (Show Details)
wpmirrordev added a subscriber: wpmirrordev.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 12 2015, 9:57 AM
Umherirrender added a subscriber: Umherirrender.EditedAug 12 2015, 3:31 PM

https://en.wikipedia.org/?curid=41813333 no longer exists on en.wp

https://en.wikipedia.org/?curid=44184271 exists

41813333 was deleted while the dump was created, and the dump takes some time so new created pages are added at end, while deleted pages could not deleted from dump when already dumped.

The dump is not a snapshot for a point of time in the past (that would need extra handling or a stopped slave)

wpmirrordev closed this task as Resolved.Aug 12 2015, 10:36 PM