- Affected components:
- The Page deletion (archiving) feature of MediaWiki core.
- The Revision delete (RevDel) feature of MediaWiki core.
- Engineer for initial implementation: TBD.
- Code steward: TBD.
Motivation
We currently have two systems that are provide a ways to make content no longer publicly accessible:
- Page deletion.
- Revision delete.
The "Revision delete" system seems to scale fairly well currently. It has a natural way to limit or divide its internal database interactions. If it were to show scale problems, we would have a clear path for how to make it scale further.
The "Page delete" system on the other hand has severe limitations. Even if we ignore the edge case of pages with 5000+ revisions, the underlying concept is still problematic. Database operations for smaller page that move rows between tables is something DBAs would prefer never happens, even at a small scale, and should be migrated away from as soon as possible.
The objective is to unify these two systems and end up with something that is as good as the best of both.
Issues:
- T13402: Deleting of pages with high number of revisions makes server cry
- T45911: Special:Undelete fails when too many checkboxes are checked
- T198176: Mediawiki page deletions should happen in batches of revisions
- T196950: Pages do not have stable identifiers
Requirements
- Administrators must still be able to delete entire pages in a way that is as easy as "Page deletion" is today.
- Administrators must still be able to selectively hide revisions in a way that is as easy as "Revision deletion" offers today.
- The technical implementation of that action must not move rows between tables.
- The viewing of "Page history" and "User contributions" (and related APIs) must not display revisions of deleted pages (by default), the same as today.
Exploration
Status quo: Page deletion
This is MediaWiki's original deletion system. Exposed through the interface as "Delete page" (action=delete) and "Restore page" (Special:Undelete).
Database process:
Moves a page and its revisions to the "archive" database table.
Visibility:
Revisions from deleted (or "archived") pages are not shown in page history, or user contributions. Administrators may view them via Special:Undelete/<title> or Special:DeletedContributions/<user>.
Limitations:
The database process for page deletion is inefficient. This cannot be improved because the problem is not how we do it, it is what we do (moving rows between tables). This concept is considered bad practice for database operations. This is why, in order to reduce its negative impact on database stability, replication lag, and performance - "Page deletion" can be limited via the $wgDeleteRevisionsLimit configuration. When limited, only users with the bigdelete may access the feature on pages with more than this number of revisions.
On Wikimedia wikis, the limit has been set at 5,000 revisions. And the right has mostly been reserved to Stewards and Developers. When used with caution, these users are then sometimes able to perform the deletion through a simple request procedure. However, even with this user right, the underlying process is highly inefficient and can cause a longer lasting impact on the database performance in the minutes/hours that follow. As such, all database transactions have additional limits on Wikimedia wikis, that abort these when this is about to happen.
Pages with revisions a lot more than 5,000 as such cannot be deleted through this process. The only way to do so in a way that does not disrupt database performance would be to batch the deletion. However, it is unknown whether it is feasible to do this in a safe manner, given the possible database failure and rollback scenarios it would have to account for.
See also:
- https://www.mediawiki.org/wiki/Help:Deletion_and_undeletion
- T13402: Deleting of pages with high number of revisions makes server cry
- T198176: Mediawiki page deletions should happen in batches of revisions
- T57398: Move page deletion to a RevDelete mechanism; kill archive table (fire optional)
Status quo: Revision delete
This is a newer mechanism introduced in 2009. Exposed on the "View history" and "User contributions" views as "Change visibility of selected revisions". And works by ticking the relevant check boxes first.
Database process:
Changes the numerical value in the rev_delete field for the relevant revisions in the database. This can be done in batches.
Visibility:
Revisions that have been "deleted" (or "hidden") still have a placeholder row shown in the interface on "Page history" and "User contributions".
The "Revision delete" feature allows admins to decide which aspect(s) of a revision to hide, and from whom. In particular, it is capable of separately controlling the visibility of the textual content, the edit summary, or the user's name/IP. And it can hide it from either non-admins only, or from everyone (suppression, aka "oversight").
Limitations:
I couldn't find any limitation in the code (which is concerning), but the interfaces (History page, Contributions page) do have a limitation on how many revisions they offer at once. And in any event, there are general transaction limits that will still apply. Regardless of whether this needs a limit, though, it could be batched internally if needed (either in-request or using the JobQueue). And as last fallback, the user themselves has the option to manually "batch" as well (e.g. increase history to show 500 rows at once, and shift-select it as one chunk). Which could work in extreme cases when stewards/developers need to intervene.
See also https://www.mediawiki.org/wiki/Help:RevisionDelete.
Proposal
Nothing specific yet, but it seems I (@Krinkle) and others find it worth exploring to see if we can re-implement the logic behind "Page deletion" by using the same code and database logic that is used by "Revision delete". This would involve the following:
- Add a bit-field value for revision.rev_delete to represent "archived".
- Update page/user revision views (Page history, User contributions) to make sure revisions with this flag are not shown by default.
- Add a way to see them. (e.g. re-using Special:DeletedContributions, or through a switch on Special:Contribs itself, same for history).
- TODO: Decide what to do with the page entity itself (meta data). E.g. a page_deleted flag (possibly including a state for "deletion in progress", to be batch-friendly).
- TODO: Decide how/if to migrate archive into revision.rev_delete=archived.