Page MenuHomePhabricator

RFC: How should we fix the undeletion system?
Open, Stalled, MediumPublic

Description

The following supercedes T185167, which I had closed because it was determined to be too destructive.

With T183375, undeletion no longer sets the parent IDs for the undeleted revisions to the "expected" one given by the previous revision by ID (usually, but not always, also the previous revision by timestamp), but instead to the ar_parent_id values. For example, look at the history of Ernest Madu on Wikipedia, and you'll see that revision 820864133 has parent ID 0 (with an "N" mark in the user's contribs) instead of 820863993. But even this alone can still cause problems.

Flaws with the current interface

Some flaws include:

  • Failure to distinguish page IDs
  • Broken parent revisions
  • "Ancestor" revisions being scattered randomly across different titles and mixed in with unrelated histories, with no easy way to rejoin all the revisions into the history of one single title
    • This is the same problem that would occur when manually changing rev_page fields (with the current behavior, deleting A, moving B to A without redirect, undeleting a single revision for A, moving A back to B without redirect, and finally undeleting the rest of the revisions for A is effectively equivalent to simply changing the rev_page field for the single revision from A's page ID to B's page ID).
  • Takes forever to load for pages with thousands of deleted revisions
Proposals 1 and 2

Moved to T206587 to keep the main task's description to a reasonable size.

Proposal 3 (to be attempted for consensus if Proposals 1 and 2 both fail)

When undeleting, always set each undeleted revision's parent ID to the previous undeleted revision among those revisions that had the same ar_page_id field (which will usually be the same as the ar_parent_id field), or zero if there is no such revision, with the following exception: if selective undeletion had been used to effectively revert a page to an older revision and the page had not been edited since the selective undeletion, undeleting later revisions (which would imply a necessary update to the page_latest field in the page table) will leave the parent ID for the earliest undeleted revision as the page's current revision instead of changing the parent ID to zero. This can be done by making a variable that is initialized as an empty array and then changes each time a revision is being inserted. The key is the ar_page_id field and the value is the ID to be used as the parent ID. The exceptional case will not be applied if the page had been edited since the selective undeletion, in order to prevent "siblings" (two revisions with the same parent revision). Each distinct ar_page_id field would then be treated as a separate history (but nonetheless would still appear altogether in a single page's history).

Advantages of Proposal 3
  • Makes revisions that were originally from different page IDs distinguishable using rev_parent_id.
  • Works better for deleted revisions that have null ar_parent_id fields.
Disadvantage of Proposal 3
  • Not idempotent—when a page is undeleted, then re-deleted, and finally undeleted again, the parent IDs might be different from what they were after the first undeletion, because Special:Undelete will no longer know that two revisions had belonged to separate page IDs.
See also

The following is a different but related RFC.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Anomie added a comment.Jun 7 2018, 2:51 PM

Use page creation log for the N marks.

Another option would be to apply an appropriate change tag to revisions that (re-)create the page.

Doing a selective undeletion on a Wikidata item is pointless, as Wikidata items cannot be moved nor can deleted Wikidata items be "recreated". The only way to make a deleted Wikidata item become blue-linked again is to undelete it. Existing Wikidata items should never have any deleted revisions. If there are any existing Wikidata items that have deleted revisions remaining due to a previous selective undeletion, then a bot should undelete them all as soon as possible (ASAP) before the alternative proposal gets approved.

daniel added a subscriber: Tgr.Jun 14 2018, 1:01 PM

@Tgr The meeting minutes say you are going to write up a proposal for fixing the parent id: tgr proposes: add a new field to store diff size, recompute it when the (timewise) previous revision changes. Use page creation log for the N marks. Use rev_parent_id for logical parents (and do not erase the information we have from the time period when it was actually used as a logical parent.).

Are you planning to do that?

Tgr added a comment.Jun 14 2018, 7:13 PM

I intend to, yeah.

Change 441045 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Change PageUpdater::hasEditConflict to take a parameter.

https://gerrit.wikimedia.org/r/441045

Tgr added a comment.Jun 21 2018, 12:04 PM

Beyond the use cases already considered, we also need to look at how Revision[Record]::getParentId() is used in hooks at such. It seems at least FlaggedRevs would break if that changed to be the logical parent.

Change 441045 was initially targeted incorrectly to this task, but it has since been fixed to target T197685 instead. So I have removed the "Patch for Review" project and reviewer gerritbot, for now. Also, we actually already have a script named populateParentId for populating null rev_parent_id fields. We just needed that script to allow fixing existing rev_parent_id fields, as well as populating or fixing ar_parent_id fields.

Vvjjkkii renamed this task from RFC: Use ar_page_id to determine the parent IDs for undeleted revisions to 1qdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 1qdaaaaaaa to RFC: Use ar_page_id to determine the parent IDs for undeleted revisions.Jul 1 2018, 8:02 AM
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
CommunityTechBot raised the priority of this task from High to Needs Triage.Jul 3 2018, 2:02 AM
GeoffreyT2000 updated the task description. (Show Details)Jul 6 2018, 2:59 PM
GeoffreyT2000 renamed this task from RFC: Use ar_page_id to determine the parent IDs for undeleted revisions to RFC: Overhaul undelete feature with a "pagearchive" table.Jul 8 2018, 7:26 PM
GeoffreyT2000 updated the task description. (Show Details)
GeoffreyT2000 updated the task description. (Show Details)Jul 8 2018, 8:52 PM
GeoffreyT2000 renamed this task from RFC: Overhaul undelete feature with a "pagearchive" table to RFC: Overhaul the undelete feature with a "pagearchive" table.Jul 8 2018, 9:14 PM
GeoffreyT2000 updated the task description. (Show Details)
GeoffreyT2000 updated the task description. (Show Details)Jul 8 2018, 9:20 PM
GeoffreyT2000 updated the task description. (Show Details)Jul 8 2018, 9:37 PM

I have renamed the task and rearranged the description to make it more focused on the alternative proposal. The new interface for undeletion will distinguish page IDs more clearly than the current interface, which lists all deleted revisions regardless of the value of their ar_page_id fields. It would also be simpler and easier to use. Instead of selecting revisions, all one has to do is to select a page ID to be undeleted and then click the "Restore" button.

daniel added a comment.EditedJul 10 2018, 10:51 AM

@GeoffreyT2000 You are proposing a change to how the deletion/undeletion system works conceptually. This is not something that can be decided by a technical RFC alone. Your proposal sounds sensible enough to me to be seriously considered, but for it to get resourcing for development, you'd have to show considerable demand from the admin community (or rather, communities). My point is: you are proposing a change in behavior, introducing and changing user-side concepts and processes. That's a feature request. It needs to be decided by product owners and UX people, not by software engineers. So it'S not a good fir for a TechCOm RFC.

Once there is community demand, and a developer team has agreed to work on an implementation, then it is time to consider a technical RFC. But that's only necessary if the solution is strategic, cross-cutting, or hard to undo. Which may or may not be the case, depending on how exactly this is implemented.

GeoffreyT2000 updated the task description. (Show Details)
GeoffreyT2000 updated the task description. (Show Details)Aug 5 2018, 3:23 PM
GeoffreyT2000 renamed this task from RFC: Overhaul the undelete feature with a "pagearchive" table to RFC: How should we fix the undeletion system?.Oct 10 2018, 2:16 AM
Scott added a subscriber: Scott.Mar 21 2019, 12:12 PM
daniel moved this task from Under discussion to Old on the TechCom-RFC board.Jul 22 2019, 4:46 PM
daniel added a project: Platform Engineering.
daniel added a subscriber: CCicalese_WMF.

Lacks product level direction. Putting this into the RFC backlog for now. Tagging Platform Engineering for input from the product perspective. @CCicalese_WMF, do you have any thoughts on this?

Krinkle updated the task description. (Show Details)Jul 22 2019, 4:48 PM

(Apparently upstream Phabricator decided h3 headings from markdown should be twice as big as the page title. Lowering heading level as workaround.)

CCicalese_WMF triaged this task as Medium priority.Jul 23 2019, 7:47 PM

This will need more analysis before a solution is chosen. It may be superseded by T20493. Putting this task in the CPT backlog for now for consideration in a future initiative.

daniel added a subscriber: kostajh.Aug 28 2019, 8:32 AM
Izno added a subscriber: Izno.Sep 1 2019, 3:55 PM
Krinkle changed the task status from Open to Stalled.Wed, Sep 16, 8:12 PM
Krinkle moved this task from Old to P1: Define on the TechCom-RFC board.

Marking as stalled until T20493 is resourced and further along.