Page MenuHomePhabricator

RFC: How should we fix the undeletion system?
Open, Stalled, MediumPublic


The following supercedes T185167, which I had closed because it was determined to be too destructive.

With T183375, undeletion no longer sets the parent IDs for the undeleted revisions to the "expected" one given by the previous revision by ID (usually, but not always, also the previous revision by timestamp), but instead to the ar_parent_id values. For example, look at the history of Ernest Madu on Wikipedia, and you'll see that revision 820864133 has parent ID 0 (with an "N" mark in the user's contribs) instead of 820863993. But even this alone can still cause problems.

Flaws with the current interface

Some flaws include:

  • Failure to distinguish page IDs
  • Broken parent revisions
  • "Ancestor" revisions being scattered randomly across different titles and mixed in with unrelated histories, with no easy way to rejoin all the revisions into the history of one single title
    • This is the same problem that would occur when manually changing rev_page fields (with the current behavior, deleting A, moving B to A without redirect, undeleting a single revision for A, moving A back to B without redirect, and finally undeleting the rest of the revisions for A is effectively equivalent to simply changing the rev_page field for the single revision from A's page ID to B's page ID).
  • Takes forever to load for pages with thousands of deleted revisions
Proposals 1 and 2

Moved to T206587 to keep the main task's description to a reasonable size.

Proposal 3 (to be attempted for consensus if Proposals 1 and 2 both fail)

When undeleting, always set each undeleted revision's parent ID to the previous undeleted revision among those revisions that had the same ar_page_id field (which will usually be the same as the ar_parent_id field), or zero if there is no such revision, with the following exception: if selective undeletion had been used to effectively revert a page to an older revision and the page had not been edited since the selective undeletion, undeleting later revisions (which would imply a necessary update to the page_latest field in the page table) will leave the parent ID for the earliest undeleted revision as the page's current revision instead of changing the parent ID to zero. This can be done by making a variable that is initialized as an empty array and then changes each time a revision is being inserted. The key is the ar_page_id field and the value is the ID to be used as the parent ID. The exceptional case will not be applied if the page had been edited since the selective undeletion, in order to prevent "siblings" (two revisions with the same parent revision). Each distinct ar_page_id field would then be treated as a separate history (but nonetheless would still appear altogether in a single page's history).

Advantages of Proposal 3
  • Makes revisions that were originally from different page IDs distinguishable using rev_parent_id.
  • Works better for deleted revisions that have null ar_parent_id fields.
Disadvantage of Proposal 3
  • Not idempotent—when a page is undeleted, then re-deleted, and finally undeleted again, the parent IDs might be different from what they were after the first undeletion, because Special:Undelete will no longer know that two revisions had belonged to separate page IDs.
See also

The following is a different but related RFC.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 441045 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Change PageUpdater::hasEditConflict to take a parameter.

Beyond the use cases already considered, we also need to look at how Revision[Record]::getParentId() is used in hooks at such. It seems at least FlaggedRevs would break if that changed to be the logical parent.

Change 441045 was initially targeted incorrectly to this task, but it has since been fixed to target T197685 instead. So I have removed the "Patch for Review" project and reviewer gerritbot, for now. Also, we actually already have a script named populateParentId for populating null rev_parent_id fields. We just needed that script to allow fixing existing rev_parent_id fields, as well as populating or fixing ar_parent_id fields.

Vvjjkkii renamed this task from RFC: Use ar_page_id to determine the parent IDs for undeleted revisions to 1qdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot raised the priority of this task from High to Needs Triage.Jul 3 2018, 2:02 AM
GeoffreyT2000 renamed this task from RFC: Use ar_page_id to determine the parent IDs for undeleted revisions to RFC: Overhaul undelete feature with a "pagearchive" table.Jul 8 2018, 7:26 PM
GeoffreyT2000 updated the task description. (Show Details)
GeoffreyT2000 renamed this task from RFC: Overhaul undelete feature with a "pagearchive" table to RFC: Overhaul the undelete feature with a "pagearchive" table.Jul 8 2018, 9:14 PM
GeoffreyT2000 updated the task description. (Show Details)

I have renamed the task and rearranged the description to make it more focused on the alternative proposal. The new interface for undeletion will distinguish page IDs more clearly than the current interface, which lists all deleted revisions regardless of the value of their ar_page_id fields. It would also be simpler and easier to use. Instead of selecting revisions, all one has to do is to select a page ID to be undeleted and then click the "Restore" button.

@GeoffreyT2000 You are proposing a change to how the deletion/undeletion system works conceptually. This is not something that can be decided by a technical RFC alone. Your proposal sounds sensible enough to me to be seriously considered, but for it to get resourcing for development, you'd have to show considerable demand from the admin community (or rather, communities). My point is: you are proposing a change in behavior, introducing and changing user-side concepts and processes. That's a feature request. It needs to be decided by product owners and UX people, not by software engineers. So it'S not a good fir for a TechCOm RFC.

Once there is community demand, and a developer team has agreed to work on an implementation, then it is time to consider a technical RFC. But that's only necessary if the solution is strategic, cross-cutting, or hard to undo. Which may or may not be the case, depending on how exactly this is implemented.

GeoffreyT2000 renamed this task from RFC: Overhaul the undelete feature with a "pagearchive" table to RFC: How should we fix the undeletion system?.Oct 10 2018, 2:16 AM
daniel added a project: Platform Engineering.
daniel added a subscriber: CCicalese_WMF.

Lacks product level direction. Putting this into the RFC backlog for now. Tagging Platform Engineering for input from the product perspective. @CCicalese_WMF, do you have any thoughts on this?

(Apparently upstream Phabricator decided h3 headings from markdown should be twice as big as the page title. Lowering heading level as workaround.)

This will need more analysis before a solution is chosen. It may be superseded by T20493. Putting this task in the CPT backlog for now for consideration in a future initiative.

Krinkle changed the task status from Open to Stalled.Sep 16 2020, 8:12 PM
Krinkle moved this task from Old to P1: Define on the TechCom-RFC board.

Marking as stalled until T20493 is resourced and further along.

Change 854012 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/WikibaseMediaInfo@master] Reduce logspam from "Revision X belongs to Y instead of expected Z"

Change 854012 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Reduce logspam from "Revision X belongs to Y instead of expected Z"