Page MenuHomePhabricator

New script for cleaning out deleted pages
Open, Needs TriagePublic

Description

The DeletePagesForGood extension should have a maintenance script that cleans out deleted pages more thoroughly than the "deleteArchivedRevisions.php" script in the MediaWiki core.

The script should not only empty out the archive table but also do the following:

  • Delete all tags for deleted revisions from the "change_tag" table, and update the counts accordingly in the "change_tag_def" table. (This should actually be done in "deleteArchivedRevisions.php".)
  • Delete all deleted revisions from the slots table. (Again, this should be part of "deleteArchivedRevisions.php".)
  • Delete all log entries where log_page is a nonzero integer that is not the page ID of any existing page except the ones that T191159 would keep.
  • For the log entries that T191159 would keep, fix log_page to be zero for all such entries.
  • Delete all patrol log entries where the corresponding revision ID has been deleted.
  • Delete all recentchanges entries where rc_cur_id is a nonzero integer that is not the page ID of any existing page except the ones that T191159 would keep.
  • For the recentchanges entries that T191159 would keep, fix rc_cur_id to be zero for all such entries.
  • For all deleted log and recentchanges entries, delete the tags from the "change_tag" table, and the corresponding rows from the "log_search" table.
  • Delete all rows from the watchlist table where neither the subject namespace pages nor the corresponding talk namespace pages exist, neither title has a move log entry, and the page being unwatched is not the watcher's user page or user talk page.
  • Delete rows from various extension tables that refer to a deleted page ID or revision ID as in T225846.

Note that after running this script, moves over redirects would no longer have corresponding "deleted redirect ... by overwriting" log entries. Also, if a redirect from a move had previously been deleted normally, then visiting the "Create" tab for the redlinked title would only show a move log entry that is not a "without leaving a redirect" move without a deletion log entry. We would usually expect the latest move log entry for a redlinked title to be either a "without leaving a redirect" move or followed by a deletion log entry.