Page MenuHomePhabricator

Prevent broken parent revisions
Open, Needs TriagePublic

Description

Sometimes, undeletions can cause parent revisions to become broken, as a consequence of actually preserving the existing value (ar_parent_id) in the archive table (T183375). For example, the rev_parent_id for revision 136808840 on the English Wikipedia is 87419000, which is a deleted revision ID. Until T193690 is approved, we need to find out how we can prevent the tragedy of broken parent revisions.

Three possible ways to solve this, from best to worst, include the following:

  • When a revision is being inserted with the "insertRevisionOn" function in the RevisionStore class, check the provided parent ID for the RevisionRecord object and see if it is a non-existent or deleted revision. If it is, then it will be treated as if it were not provided at all, and the result of the "getPreviousRevisionId" function will be used instead. Also, when a page B is being deleted, check to see if there is another page A with a revision whose rev_parent_id field is a revision from page B that is being deleted, and update the field for each such revision. Finally, make the deleteOldRevisions.php maintenance script change the rev_parent_id field for each (selected) page's latest revision (which will become the only one) to zero.
  • When a revision is being undeleted, apply the "domino effect" by forcing the parent revision to be undeleted as well. Also, when the parent revision for a revision from page A is a revision from another page B and page B is being deleted, force page A to be deleted as well. Finally, get rid of the deleteOldRevisions.php maintenance script entirely.
  • When an undeletion would result in a broken parent revision, forgo the undeletion and just display an error instead. Also, when the parent revision for a revision from page A is a revision from another page B, don't allow page B to be deleted until page A is deleted first. Finally, as with the second solution above, get rid of the deleteOldRevisions.php maintenance script entirely.

Regardless of how this is solved, we will need to fix the existing revisions with broken parent revisions with a maintenance script (T186280).

Event Timeline

Wouldn't solutions 2/3 make it effectively impossible to do selective deletion/undeletion, since most revisions that aren't the first in the page history have a non-0 rev_parent_id? A major reason to do that these days is for history splits ... another reason to make a Special:HistSplit page yesterday/redo the rev_parent_id system altogether.

Change 433313 had a related patch set uploaded (by GeoffreyT2000; owner: GeoffreyT2000):
[mediawiki/core@master] Check the parent ID for existence when inserting a revision

https://gerrit.wikimedia.org/r/433313

GTrang triaged this task as Medium priority.May 16 2018, 5:04 AM
GTrang updated the task description. (Show Details)
Vvjjkkii renamed this task from Prevent broken parent revisions to c4daaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Yann renamed this task from c4daaaaaaa to Prevent broken parent revisions.Jul 1 2018, 1:21 PM
Yann lowered the priority of this task from High to Medium.
Yann updated the task description. (Show Details)
Yann removed a subscriber: GTrang.
Yann added subscribers: GTrang, gerritbot, Aklapper.

The description states:

Until T193690 is approved, we need to find out how we can prevent the tragedy of broken parent revisions.

Since parent IDs are not used for much, I wonder why this is such a big deal.

Change 433313 abandoned by GeoffreyT2000:
Check the parent ID for existence when inserting a revision

Reason:
Abandoning for now pending consensus.

https://gerrit.wikimedia.org/r/433313

What is the "tragedy of broken parent revisions"? Are they causing errors? Conceptually, there is nothing broken in the parent_id pointing to the parent revision, even if it happens to be deleted.

Aklapper raised the priority of this task from Medium to Needs Triage.Aug 9 2019, 11:05 PM

@GeoffreyT2000: Hi, could you answer the last question, please?