Page MenuHomePhabricator

Clean up archive rows with duplicate revision IDs
Closed, ResolvedPublic

Description

There are, broadly, four classes of "duplicates" that may exist and need cleaning up:

  • Cases where both a revision and an archive row exist for the same change.
  • Cases where an archive row exists with the same revision ID as some other change in revision.
  • Cases where multiple archive rows exist for the same change.
  • Cases where multiple archive rows use the same revision ID for different changes.

We can probably define "same change" based on the title, sha1, timestamp, and user.

Fixing this should be reasonably straightforward: find the duplicates, classify them, delete the archive rows for the two "same change" cases, and assign new revision IDs (as in T182678) for the "different change" cases.

Details

Related Gerrit Patches:
mediawiki/core : masterAvoid recreating ar_revid index after it's replaced by ar_revid_uniq
mediawiki/core : masterMake archive.ar_rev_id unique
mediawiki/core : masterDeduplicate archive.ar_rev_id
mediawiki/core : wmf/1.32.0-wmf.4Deduplicate archive.ar_rev_id
mediawiki/core : wmf/1.32.0-wmf.3Deduplicate archive.ar_rev_id

Event Timeline

Anomie triaged this task as Medium priority.Apr 26 2018, 3:56 PM
Anomie created this task.
Restricted Application added a project: Wikidata. · View Herald TranscriptApr 26 2018, 3:56 PM
daniel rescinded a token.
daniel added a subscriber: CCicalese_WMF.

Change 429345 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Deduplicate archive.ar_rev_id

https://gerrit.wikimedia.org/r/429345

Change 429455 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Make archive.ar_rev_id unique

https://gerrit.wikimedia.org/r/429455

Change 429345 merged by jenkins-bot:
[mediawiki/core@master] Deduplicate archive.ar_rev_id

https://gerrit.wikimedia.org/r/429345

Change 433367 had a related patch set uploaded (by Gergő Tisza; owner: Anomie):
[mediawiki/core@wmf/1.32.0-wmf.3] Deduplicate archive.ar_rev_id

https://gerrit.wikimedia.org/r/433367

Change 433368 had a related patch set uploaded (by Gergő Tisza; owner: Anomie):
[mediawiki/core@wmf/1.32.0-wmf.4] Deduplicate archive.ar_rev_id

https://gerrit.wikimedia.org/r/433368

Change 433367 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.3] Deduplicate archive.ar_rev_id

https://gerrit.wikimedia.org/r/433367

Change 433368 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.4] Deduplicate archive.ar_rev_id

https://gerrit.wikimedia.org/r/433368

Mentioned in SAL (#wikimedia-operations) [2018-05-23T16:10:21Z] <anomie> Running deduplicateArchiveRevId.php on group 0 for T193180

Mentioned in SAL (#wikimedia-operations) [2018-05-24T13:27:03Z] <anomie> Running deduplicateArchiveRevId.php on group 1 for T193180

Mentioned in SAL (#wikimedia-operations) [2018-05-24T16:31:15Z] <anomie> Running deduplicateArchiveRevId.php on group 2 for T193180

Anomie closed this task as Resolved.May 24 2018, 8:31 PM

Should be fixed now.

Change 429455 merged by jenkins-bot:
[mediawiki/core@master] Make archive.ar_rev_id unique

https://gerrit.wikimedia.org/r/429455

Change 437275 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Avoid recreating ar_revid index after its replaced by ar_revid_uniq

https://gerrit.wikimedia.org/r/437275

greg added a subscriber: greg.

Change 429455 merged by jenkins-bot:
[mediawiki/core@master] Make archive.ar_rev_id unique
https://gerrit.wikimedia.org/r/429455

Looks like this caused T196401: beta-update-databases failing

Change 437275 merged by jenkins-bot:
[mediawiki/core@master] Avoid recreating ar_revid index after it's replaced by ar_revid_uniq

https://gerrit.wikimedia.org/r/437275

Vvjjkkii renamed this task from Clean up archive rows with duplicate revision IDs to 74daaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Anomie as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
AntiCompositeNumber renamed this task from 74daaaaaaa to Clean up archive rows with duplicate revision IDs.Jul 1 2018, 12:36 PM
AntiCompositeNumber closed this task as Resolved.
AntiCompositeNumber assigned this task to Anomie.
AntiCompositeNumber lowered the priority of this task from High to Medium.
AntiCompositeNumber updated the task description. (Show Details)
daniel reopened this task as Open.Aug 16 2018, 12:21 PM

Re-opening, since we again have multiple archive rows with the same ar_rev_id on several wikis, see T202032.

daniel raised the priority of this task from Medium to High.Aug 16 2018, 12:21 PM

Bumping to high, since this now blocks the completion of T183488.

I think all the action is going to happen on T202032 rather than here. Once the underlying issue there is fixed, this task will be to just re-run the maintenance script.

Anomie closed this task as Resolved.Aug 27 2018, 3:45 PM

See T202032 for all the action.