Page MenuHomePhabricator

Overhaul the undelete feature with a "pagearchive" table
Open, Needs TriagePublic

Description

The following sections have been moved from T193690 to avoid making that task's description too long.

Proposal 1

Overhaul the undelete feature to make it completely flawless by making pages more like files on your computer with the following steps. This would also make selective undeletion and undeleting revisions under existing pages' histories things of the past. RevisionDelete should be used in lieu of selective undeletion.

  1. Create a pagearchive table with a migration script as I have suggested at T161671#4199220.
  2. Create a script that fills in null pa_page_id (formerly ar_page_id) fields, similar to what T182678 did for ar_rev_id.
  3. Create a script that fixes duplicate pa_page_id fields, as well as those that duplicate an existing page's ID, similar to what T193180 did for ar_rev_id and rev_id.
  4. Create a script that cleans up redundant duplicate revisions from both existing and deleted page histories by keeping the smallest revision ID only. Two revisions are considered to be duplicates of each other if they have the same timestamp, same comment, same minority status, and same SHA1. The usernames, parent IDs, and rev_deleted or ar_deleted do not matter. If the text, comment, and username for the smallest revision ID are all hidden with RevisionDelete, the script will unhide them. The script will also update the page_latest field when necessary (i.e. when the latest revision happens to itself be a duplicate of some other revision in the same page's history).
  5. Add a temporary option to the populateParentId script named "--fix-existing" or something similar that, if used, will also update the rev_parent_id field for all existing revisions that already have a parent ID. Eventually, this option will be removed and the new behavior will become permanent.
  6. Make the populateParentId script also fill in null ar_parent_id fields in the archive table (thereby completing the modernization of legacy rows), and if the "--fix-existing" option is used, update the ar_parent_id field for all deleted revisions that already have a parent ID.
  7. Make the rebuildrecentchanges script use rev_parent_id to calculate rc_last_oldid, as well as to determine rc_new, rc_type, and rc_source.
  8. Make the deleteOldRevisions script change the rev_parent_id field for the latest revision to zero for all the pages whose old revisions are being deleted.
  9. Change the undelete feature (the PageArchive class) to accept a single title and a single page ID rather than an array (or list) of timestamps, in order to force "everything" to be preserved, including rev_page and rev_parent_id.
  10. Remove the $overrides parameter from the newRevisionFromArchiveRow function in the RevisionStore class, which should no longer be needed.
  11. Add an $unsuppress parameter to the newRevisionFromArchiveRow function that will be used when a suppressor restores a suppressed page with the "Remove restrictions on restored revisions" checkbox.
  12. Make page histories viewable for deleted page IDs by using the "curid" parameter.
  13. Make deleted revisions and their diffs viewable by using the "oldid" parameter (T20104).
  14. Make Special:DeletedContributions share the features of Special:Contributions (e.g. displaying size differences using ar_parent_id and ar_len, as well as "N" for deleted revisions with zero ar_parent_id).
  15. Change the Special:Undelete interface to display radio buttons with "View history" links for each deleted page ID rather than checkboxes for each deleted revision. The radio button corresponding to the page ID for the ultimate latest deleted revision will be selected by default.
  16. When there is no existing page having the same title as the one you are trying to undelete, make choosing another title for the undeleted page optional. In this case, there will also be a checkbox (checked by default) for leaving a redirect at the original title. For existing pages, the "View or restore # deleted edits" link will still appear when viewing the history, but choosing another title will become mandatory. In this case, the existing page will be temporarily deleted so that the other page can be undeleted and moved to the chosen title without redirect. After that, the temporarily deleted page will immediately be undeleted.
  17. Restrict the import feature by only allowing imports to existing page titles if the revisions being imported are either all later than the page's current revision, all earlier than the page's first revision, or all fit between 2 consecutive revisions in the page's history. In the latter 2 cases, the first revision following the imported revisions will automatically have the rev_parent_id field changed to the ID of the latest imported revision. In particular, single-revision imports will always continue to be allowed. For any other import, the importer must choose another page title, and manually redirect that title to the original page title if the page already exists.
  18. Add a special page named "Special:SplitHistory" that allows an administrator to easily split the history of an existing page at a certain point. When splitting out the first n revisions in the history of page A, the new title B will take A's original page ID and page A will get a new page ID. The first revision that stays at page A will automatically have the rev_parent_id field changed to zero. When splitting out the last n revisions in the history of page A, page A will keep its original page ID and the new title B will get a new page ID. The page_latest field will automatically be updated for page A, and the first revision that gets moved to page B will automatically have the rev_parent_id field changed to zero.
  19. For deleted pages, Special:SplitHistory will instead split the history of a deleted page ID. First, it will list deleted page IDs as radio buttons. Once a page ID is selected and one clicks the "Show history" button, all of the deleted revisions belonging to that page ID will be shown as radio buttons. When one selects a revision, that revision and all later revisions will be splitted out to a new deleted page ID while also remaining at the original title.
  20. Add a special page named "Special:MergeAndMove" that allows an administrator to simultaneously merge the history of a page B into an older page A and move A to B (with or without redirect). This means that the rev_page field for each revision in the history of B will be changed to A's page ID and the page_latest field for page A updated before moving A to B, and will only be allowed when page A has not been edited since the creation of page B. The first revision originally in the history of B will automatically have the rev_parent_id field changed to the same value as the original page_latest field for page A.
  21. The "MergeHistory" feature will continue to exist. However, merging A into B will only be allowed if at least one existing revision will remain in the history of A after the merge; otherwise, "Special:MergeAndMove" must be used instead. Also, the first remaining revision in the history of A will automatically have the rev_parent_id field changed to zero, while the first revision in the history of B following the merged revisions will automatically have the rev_parent_id field changed to the ID of the latest merged revision.
  22. Finally, in both Special:SplitHistory and Special:MergeHistory, revision IDs will be used to distinguish revisions having the same timestamp, so this would also solve T39465 and T183501.

With this proposal, page histories would be kept as simple as possible, while also limiting the recalculation of size differences to one or two revisions at a time. Neither rollback nor undo links will appear in deleted page histories or deleted revision diffs. Many messages that are currently displayed on Special:Undelete (such as "The following consists of deleted revisions of Foo.") will also need to be either rewritten or removed entirely.

Proposal 2

Same as Proposal 1, but instead of adding a "SplitHistory" special page, undeletion would become a two-step process. First, one selects a page ID to be undeleted. Then, one would see radio buttons for each revision belonging to the selected page ID. When a revision is selected, that revision and all earlier revisions will be restored. The ar_parent_id field for the revision following the selected revision (if there is one) will automatically be changed to zero, and that revision and all later revisions will be assigned a new page ID. The radio button for the latest revision will be selected by default. If the latest revision is selected, then the corresponding row will be completely deleted from the pagearchive table; otherwise, the pa_rev_count field will be reduced by the number of revisions being restored.

Problems to be solved

The above proposals will solve all of the following problems:

  • Size differences being incorrect or outdated (e.g. Template talk:Db-g1/Archive 1 or Gema Switzerland on Wikipedia; usually caused by imports done in 2015 or earlier or undeletions of revisions deleted in pre-1.5 versions of MediaWiki; see also T38976)
  • Revisions with the same timestamp being inseparable
  • Broken parent revisions (T186280 and T193211)
  • "Contaminated" histories caused by mixing the histories of multiple pages together
  • The page move revision needing to be undone after a history merge (solved with Special:MergeAndMove)
  • The "newer" page having a smaller page ID than the "older" one after a history split (solved by making Special:SplitHistory choose page IDs carefully for Proposal 1, or by automatically replacing the relevant pa_page_id field with a new page ID after restoring revisions up to a certain point for Proposal 2)
  • "Pages created" tools (XTools and Sigma) not listing creations deleted in 2011 or earlier (solved by retroactively filling in the missing ar_parent_id fields)
  • Revisions with visible text by registered non-bot users other than the currently logged-in user not being thankable from the history page (T186470, which has been fixed so that this is no longer an issue)

Unchanged behaviors

The following behaviors will not be changed:

  • Undeletion will still preserve ar_parent_id as rev_parent_id and display the number of restored revisions (always the same as the pa_rev_count field for Proposal 1 only) in the log entry.
  • Undeleting files will still be done by selecting checkboxes.
  • Importing will still insert new revisions into the page's history.
  • Special:MergeHistory will still update the rev_page field for some revisions in the source page's history.
  • Special:DeletedContributons will still not display rollback links or the "(current)" mark.

Extensions to be updated

The following extensions will need to be updated:

  • The DeletePagesForGood extension will need to delete rows from the pagearchive table.
  • The RevisionSlider extension will need to also work properly for diffs between deleted revisions.
  • The Thanks extension will need to know not to display "thank" links when viewing deleted page histories or deleted revision diffs.

Actions that will require using other tools

The following actions will require using tools other than deletion or undeletion:

  • Reverting a page to an older revision: Split off the later revisions to another title using Special:SplitHistory, and then delete the target page. Or alternatively, delete the page, then split off the later revisions without moving them, and finally undelete the original page ID. Selective undeletion will not make sense anymore. (Proposal 1 only)
  • Merging deleted revisions to an existing page's history: Move the page to a temporary title (e.g. A (temp) for A) without redirect, then undelete the deleted history of A, use Special:MergeAndMove to merge A's history with A (temp)'s history without redirect, and finally move A (temp) back to A without redirect. If there are several deleted page IDs, start with the latest one and then repeat steps 2 and 3 for each earlier page ID before doing step 4.

New classes to be added

The following 2 classes will be added to the MediaWiki core if either Proposal 1 or Proposal 2 passes:

  • SplitHistory: Contains functions for dealing with history splits (will be used in Special:SplitHistory). Details on what it will do are available at T20493#4446127. (Proposal 1 only)
  • MergeAndMove: Contains functions that will be used in Special:MergeAndMove. One of the functions, the main one, will do the following for two given pages A and B:
    1. Check to see if the latest revision for page A is before the oldest revision for page B. If not, an error will be generated. Otherwise, the function will continue with the following steps.
    2. Change the rev_parent_id field for B's oldest revision from zero to the value of the page_latest field for page A.
    3. Change the rev_page field for each revision in the history of B from B's page ID to A's page ID.
    4. Delete page B from the page table without generating a deletion log entry.
    5. Apply WikiPage::doDeleteUpdates to page B with its original page ID.
    6. Generate an "automerge" log entry in the merge log.
    7. Update the page_latest field and links for page A.
    8. Move page A to B as usual. After moving, there is no need to revert the page to the last pre-move revision, since page A was already updated before it was moved.
    9. Finally, restore the previous protection settings for page B if it was protected before the history merge, but page A was not. If only A was protected, its protection settings will carry over to B as usual. If both A and B were protected, the protection settings for B will be overwritten with those for A, and the administrator must manually change the protection level or expiry if needed.

New log actions to be added

An "automerge" log action will be added for logging automatic merges done using the "MergeAndMove" feature. They will appear in the merge log, so a new case would then need to be added to the "MergeLogFormatter" file and merge would then need to be added to $wgActionFilteredLogs.

Also, a "split" log action will be added for logging history splits in the "split" log (Proposal 1 only). A "SplitLogFormatter" file would then need to be created.

Wikidata and other Wikibase wikis

For Wikidata and other wikis that use the Wikibase extension, hooks should be created to disallow doing history splits (Proposal 1 only), history merges, and imports to and from Wikibase items. With Proposal 2, the first hook would instead disable all radio buttons for deleted revisions on Special:Undelete other than the latest one after selecting the deleted page ID (there should always be just one deleted page ID for a deleted Wikibase item). Also, Wikibase items that have previously been selectively restored should have all of the remaining deleted revisions restored under the old schema before switching over to the new schema.

Other notes

I have written Proposal 1 at the Community Wishlist Survey 2019, but that one turned out to be the lowest proposal in the results list. If Proposal 2 (with no need for a "SplitHistory" special page) turns out to be better than Proposal 1, then the former should be implemented rather than the latter.

If an administrator needs to restore only the last n revisions, then with Proposal 2, the administrator must first restore only the earlier revisions, then re-delete the page, and finally restore the remaining revisions.

One administrator on Wikipedia who frequently does history merges is Anthony Appleyard. If either proposal passes, then that user would have to start using "Special:MergeAndMove" to do history merges, and would no longer need to "revert histmerge junk" after doing so (except for the removal of a "histmerge" template).

With Proposal 1, the "undelete" API function would have to replace the "timestamps" parameter with a "pageid" parameter. With Proposal 2, a "revid" parameter must also be added in addition to the replacement in Proposal 1.

Because the "duplicate revisions cleanup" script might delete some revision IDs that are currently the rev_parent_id for some later revision, that script must be ran before running the "populateParentId" script.

Administrators should try to unmerge (or "cure") as many mixed-up histories as they can find before this task gets implemented. (See also T71047.) Prevention is better than cure, and this task will prevent parallel histories from being able to be easily mixed together.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 10 2018, 2:09 AM
GeoffreyT2000 updated the task description. (Show Details)Dec 5 2018, 8:16 PM
GeoffreyT2000 updated the task description. (Show Details)Dec 6 2018, 1:07 AM
GeoffreyT2000 updated the task description. (Show Details)Dec 6 2018, 1:09 AM

Other notes
I have written Proposal 1 at the Community Wishlist Survey 2019, ...
One administrator on Wikipedia who frequently does history merges is Anthony Appleyard. If either proposal passes, then that user would have to start using "Special:MergeAndMove" to do history merges, and would no longer need to "revert histmerge junk" after doing so.

Under Special:MergeHistory , I still often have to "revert histmerge junk", to remove the "please histmerge this page" template call from the latest edit of the destination page.

Scott awarded a token.Feb 26 2019, 1:54 PM
Scott added a subscriber: Scott.
GeoffreyT2000 updated the task description. (Show Details)Mar 2 2019, 3:07 PM
Anthony_Appleyard added a comment.EditedMar 2 2019, 5:58 PM

Many of the history-merge requests that I get are not suitable for Special:MergeHistory , but are done quicker (or only) by the old method. Please keep selective undelete of edits, and please add selective delete of edits. Removing the ability to selectively delete a page's non-deleted edits and also the ability to selectively undelete a page's deleted edits, would handcuff me badly. As well as requests for clean classical history-merges, I also get requests for various sorts of tidying-up after many sorts of unwise actions by users.

If page X has been cut-and-paste copied to page Y, often I must remove edits made to Y before the cut-and-paste, and edits made to X after the cut-and-paste, and miscellaneous irrelevant edits from both. Many sorts of complications arise.

Sometimes someone (say User:W) copies-and-pastes from page X to page Y (where Y is often his sandbox and may contain old edits that he worked on before), and he edits Y, and after he finishes that, he copies-and-pastes back from Y to X. Then after all that he may do more work on page Y. Then I get a request to textmerge Y to X, interpolating the relevant edits from Y into the resulting gap in the history of X. I can obey this if nobody text-edited X in place while User:W was text-editing his copy of it in Y; else I must plead "can't be done :: WP:Parallel_histories" , and all that I can do is to enter a history-information section into Talk:X .