Page MenuHomePhabricator

Undelete of page with same title leads to unexpected results
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  1. Create a page "Delete Me" with content "Step 1"
  2. Delete the page "Delete Me", keep the tab open or open the delete log
  3. Create another page with the same title "Delete Me" and content "Step 3"
  4. Go back to the delete log and undelete the latest revision

Related logs from testwiki select * from logging where log_title like "PFischer-WMF/Delete_Me":

log_idlog_typelog_actionlog_timestamplog_actorlog_namespacelog_titlelog_comment_idlog_paramslog_deletedlog_page
367396createcreate20231116130659122252.02PFischer-WMF/Delete_Me246686.0"a:1:{s:17:""associated_rev_id"";i:582959;}"0153559
367398deletedelete20231116130740122252.02PFischer-WMF/Delete_Me246687.0a:0:{}0153559
367399createcreate20231116131014122252.02PFischer-WMF/Delete_Me246688.0"a:1:{s:17:""associated_rev_id"";i:582960;}"0153560
367400deleterestore20231116131032122252.02PFischer-WMF/Delete_Me246689.0"a:1:{s:12:"":assoc:count"";a:2:{s:9:""revisions"";i:1;s:5:""files"";i:0;}}"0153560

What happens?:

What should have happened instead?:

I would consider that case an exception. If the undelete does not have an effect, no event should be published. If undeleting and thereby overriding an existing page should be supported, I would a behaviour as follows:

  1. delete the current page (sharing the same title)
  2. restore the old page (under its original page_id!)

Software version (skip for WMF-hosted wikis like Wikipedia): https://test.wikipedia.org/

Other information (browser name/version, screenshots, etc.): This came to my attention while testing the search update pipeline. For revision based changed, we fetch the content of the changed page (via cirrussearch API extension) by rev_id and fail if the event's page_id does not match the one returned from the API.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The "Delete page" and "Undelete" functionality is first and foremost a system to archive/restore revisions that additionally ensures the presence or absence of a Page record to access revisions.

I'll walk through three scenarios.

Partial restore

Here is what the user interface for Undelete page looks like:

Screenshot 2023-11-16 at 16.40.38.png (1×1 px, 220 KB)

I can re-create this page with just those 3 edits in the history. I could then, later, access the "Undelete page" interface again and potentially restore one or two more older edits to become once again public in the page history. The second of these actions is an example where "Undelete" is performed, and has a very real effect, but it does not change what the current/latest revision is.

In this example, the page ID is stable throughout. I created a page with 6 edits. Deleted it. Restored it with 3, and then restored a few more edits afterwards.

The most common reason, to my knowledge, to delete and partially restore a page in this way is to hide a subset of revisions (i.e. the ones you choose not to restore). We have since then created an easier way to do this, namely the "Revision delete" functionality in MediaWiki, which lets admins select checkboxes in a similar way on the page history page itself, and then "change visibility" of those revisions. This does not move their records between revision and archive table, it merely changes the rev_deleted bitflag, and this feature cannot be used on the latest revision, and it cannot be used to delete the page title itself. Other than that, it's quite similar, and I would generally recommend thinking of delete/undelete as similar to changing revision visibility, given that both are reversible, and both can be done partially for a subset of edits.

Screenshot 2023-11-16 at 16.40.01.png (514×1 px, 199 KB)

The RFC to unify these two similar systems is awaiting resourcing, since 2009: T20493: RFC: Unify the various deletion systems

Late restore

Consider an article that was written and then deleted for non-abuse reasons. For example, the subject was deemed lacking in notability, or the content remained of low quality after a set time for improvement had passed. Then, after it gets deleted, someone shows interest and picks it up again. They re-create the article in a better way, and might ask for the old page to be restored to preserve the history.

The re-creation will allocate a new page ID. This is important because pages can get renamed, and we cannot re-purpose the deleted page's ID at this point since this new may end up with a life of it's own and undergo renames etc. We need to keep the ability to undelete that page one day.

Then, someone decides to restore the page. This will at that point act the same as the second "undelete" action in the first scenario above and simply show the interface as restoring edits from an otherwise existing page. During the restore process, it is ensured that all revisions belong to the same page ID, so when these deleted edits are inserted as new revisions in the revision database, they get assigned the correct page ID. They do generally preserve the same revision ID whenever possible. This ensures that permalinks like index.php?title=…&oldid=… keep working, and can be restored after a delete that is eventually undone.

Merge

Consider an article that was written and has existed for a while, and someone wants to rename it. Except they don't know how to do this (either due to not having the "move" right, or because they don't know about this feature). They might copy the article context to a new page, and then edit the old page to become a redirect.

In this case, the community needs a way to rectify this, especially if it is only noticed after the article has continued to be regularly edited for some time. For this case, MergeHistory exists to allow you to relocate the edit history of an older page (i.e. the one that became a redirect) into the newer page.

This feature was developed fairly late, so the more common way this used to be done (and still sometimes) is one of two ways:

  • Temporarily delete the old article, rename the current page to the old title, restore the old page's edits, re-rename the whole party to the new title.
  • Temporarily delete the curent article, rename the old page to the new title (as should've been done orignally), then restore the new edits on top.

This is technically the same as "late restore", but without the social aspect of a page being intentionally deleted first. The delete is here merely very brief implementation detail.

Thank you @Krinkle for the detailed answer. 🙇 I was not aware of those procedures.
TIL: page records are merely a gateway to a list of revisions and neither title nor ID of that page/list are stable over time.
I'll link this ticket in our code base for reference.

For additional context, here is some related IRC discussion.

@Krinkle we should add your awesome comment to mw docs somewhere. I'm happy to do it, but I'm not sure where the best place would be. Got any suggestions?