Page MenuHomePhabricator

Unify the various deletion systems
Open, NormalPublic

Description

Background

We currently have two systems for the deletion of revisions:

  1. Page deletion
  2. Revision delete

Page deletion

This is MediaWiki's original deletion system. Exposed through the interface as "Delete page" (action=delete) and "Restore page" (Special:Undelete).

Database process:
Moves a page and its revisions to the "archive" database table.

Visibility:
Revisions from deleted (or "archived") pages are not shown in page history, or user contributions. Administrators may view them via Special:Undelete/<title> or Special:DeletedContributions/<user>.

Limitations:
The database process for page deletion is inefficient. This cannot be improved because the problem is not how we do it, it is what we do (moving rows between tables). This concept is considered bad practice for database operations. This is why, in order to reduce its negative impact on database stability, replication lag, and performance - "Page deletion" can be limited via the $wgDeleteRevisionsLimit configuration. When limited, only users with the bigdelete may access the feature on pages with more than this number of revisions.

On Wikimedia wikis, the limit has been set at 5,000 revisions. And the right has mostly been reserved to Stewards and Developers. When used with caution, these users are then sometimes able to perform the deletion through a simple request procedure. However, even with this user right, the underlying process is highly inefficient and can cause a longer lasting impact on the database performance in the minutes/hours that follow. As such, all database transactions have additional limits on Wikimedia wikis, that abort these when this is about to happen.

Pages with revisions a lot more than 5,000 as such cannot be deleted through this process. The only way to do so in a way that does not disrupt database performance would be to batch the deletion. However, it is unknown whether it is feasible to do this in a safe manner, given the possible database failure and rollback scenarios it would have to account for.

See also:

Revision delete

This is a newer mechanism introduced in 2009. Exposed on the "View history" and "User contributions" views as "Change visibility of selected revisions". And works by ticking the relevant check boxes first.

Database process:
Changes the numerical value in the rev_delete field for the relevant revisions in the database. This can be done in batches.

Visibility:
Revisions that have been "deleted" (or "hidden") still have a placeholder row shown in the interface on "Page history" and "User contributions".

The "Revision delete" feature allows admins to decide which aspect(s) of a revision to hide, and from whom. In particular, it is capable of separately controlling the visibility of the textual content, the edit summary, or the user's name/IP. And it can hide it from either non-admins only, or from everyone (suppression, aka "oversight").

Limitations:
I couldn't find any limitation in the code (which is concerning), but the interfaces (History page, Contributions page) do have a limitation on how many revisions they offer at once. And in any event, there are general transaction limits that will still apply. Regardless of whether this needs a limit, though, it could be batched internally if needed (either in-request or using the JobQueue). And as last fallback, the user themselves has the option to manually "batch" as well (e.g. increase history to show 500 rows at once, and shift-select it as one chunk). Which could work in extreme cases when stewards/developers need to intervene.

See also https://www.mediawiki.org/wiki/Help:RevisionDelete.

Problem

The "Revision delete" system seems to scale fairly well, and if/when it shows problems, there's a clear path for how to make it work for larger pages.

The "Page delete" system on the other hand has severe limitations. Even if we ignore the edge case of pages with 5000+ revisions, the underlying concept is still problematic. Database operation for smaller page that move rows between tables is something DBAs would prefer never happens, and should be migrated away from.

Issues:

Solution

Requirements
  • Administrators must still be able to delete entire pages in a way that is as easy as "Page deletion" is today.
  • Administrators must still be able to selectively hide revisions in a way that is as easy as "Revision deletion" offers today.
  • The technical implementation of that action must not move rows between tables.
  • The viewing of "Page history" and "User contributions" (and related APIs) must not display revisions of deleted pages (by default), the same as today.
Proposal 1:

Nothing specific yet, but it seems I (@Krinkle) and others find it worth exploring to see if we can re-implement the logic behind "Page deletion" by using the same code and database logic that is used by "Revision delete". This would involve the following:

  • Add a bit-field value for revision.rev_delete to represent "archived".
  • Update page/user revision views (Page history, User contributions) to make sure revisions with this flag are not shown by default.
  • Add a way to see them. (e.g. re-using Special:DeletedContributions, or through a switch on Special:Contribs itself, same for history).
  • TODO: Decide what to do with the page entity itself (meta data). E.g. a page_deleted flag (possibly including a state for "deletion in progress", to be batch-friendly).
  • TODO: Decide how/if to migrate archive into revision.rev_delete=archived.
Original task description from bugzilla.wikimedia.org user FT2.wiki:

At present we now have 4 means of deleting material from either the public or from administrators. Material can either be

  • Deleted from the public with traditional deletion
  • Deleted from the public (part or full) with RevisionDeleted
  • Deleted from admin view with Oversight
  • Suppressed from admin view with RevisionDeleted

This collection means that any review of editor actions or conduct, or article matters on the wiki, now faces two big problems in evaluating the existance or seriousness or any issue:

  • It's incredibly easy to overlook some edits or actions in the review, which should be taken account of.
  • It's more complex and takes examination of several screens, to review a matter.
  • Each of these has different mechanisms for viewing edits they affect; there is no consistency of links, formats, access methods, etc.
  • A third issue at a technical level - it's a lot to maintain, and allows for inconsistent software behavior (or bugs fixed in one of these but not spotted in the other), and requires more developer time etc.

I would like to suggest that in fact, all we now need is RevisionDeleted, with the following options:

  • What to hide - revision text, edit summary, user name/IP
  • Whether admins can or can't access the hidden data
  • Whether admins or users who cannot access the hidden data, should nonetheless be able to see it exists even if they can't read it (there are cases when this is safe, and cases when it isn't).

This proposal is that RevisionDeleted is amended slightly to show the above options, and then both traditional deleted revisions and oversighted revisions are converted to RevisionDeleted entries as a background task (ie a script written that achieves this in the job queue over time). Following this:

  • Delete and oversight both redirect to RevDel for their actions
  • Delete/undelete and oversight url's both redirect to the appropriate lookup link for any historical URL used to view an old deleted/oversighted edit.

The issue here is not so much one of software development, as of a once-off conversion task of old data stored in one system to be moved to another.

Details

Reference
bz18493

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle raised the priority of this task from Low to Needs Triage.Jun 27 2018, 8:06 PM
Krinkle added a project: TechCom-RFC.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.
Krinkle triaged this task as Normal priority.Jun 27 2018, 8:38 PM
Krinkle moved this task from Inbox to Under discussion on the TechCom-RFC board.
Krinkle moved this task from Under discussion to Request IRC meeting on the TechCom-RFC board.
Krenair updated the task description. (Show Details)Jun 27 2018, 9:24 PM
Tgr added a subscriber: Tgr.Jul 8 2018, 5:19 PM
Krinkle updated the task description. (Show Details)Jul 8 2018, 9:51 PM

TechCom is hosting a Public IRC Discussion of this RFC on 2018-07-11 in the #wikimedia-office channel at 2pm PST(22:00 UTC, 23:00 CET)

There is one small nitpick, it is said that "Database operation for smaller page that move rows between tables is something DBAs would prefer never happens, and should be migrated away from." Actually, from a pure DBA point of view, moving rows deleted to a separate table is good because it is basically a bad way of implementing partitioning and requires less optimization to avoid virtually deleted rows. It is when I put on the Database engineer hat that I hate that- it is prone to cause data loss, inconsistencies and more traffic and writes than needed. It doesn't change the overall sentiment, but at least highlights one of the few things good with moving rows around (instead of virtually delete them with SET deleted = 1/INSERT latest version with deleted status, which is the standard model of doing it in most scenarios).

daniel added a subscriber: daniel.EditedJul 11 2018, 9:14 PM

A few thoughts:

We can't just do rev_deleted = archived and remove the page entry, since we would lose the page title that way. So I see two options:

  1. Have a page_archive table, and move rows between page and page_archive. Note that page_archive may have several entries for the same title. Also, there are two cases for undeletion (this is already the case now): the title exists, or does not exist in the page table. If the title does exist, this is effectively a history merge, and should perhaps be handled as such. In any case, in this case, rev_page_id of all the revisions being restored needs to be updated to the id of the existing page.
  2. Have a page_deleted field that can be set to archived. Then the question is what should happen when a page with the same name is created. Perhaps page_deleted can just be cleared, but the archived revisions remain archived? That would be close to current behavior. And there would never be two different page IDs associated with a given title, deleted or not. Which may or may not be a good thing.

EDIT:

  • conceptually, option (1) means that deleted revisions stay bound to a page ID, and when page with the same title is created (or a page is renamed to that title), the old revisions are not assigned to that page. They stay separate. New functionality would have to be added to allow users to view or undeleted revisions of "deleted pages with the same title". Deleted revisions of an existing page will behave the same as "oversighted" revisions of an existing page, and follow renames.
  • option (2) on the other hand means deleted revisions are bound to a page ID, and stay bound to to it across renames. Renaming a page will no longer "leave behind" its deleted revisions. Creating and deleting a page with the same title multiple times would result in one big history of deleted revisions (all bound to the same page ID), as opposed to multiple such histories (each with its own page id).

Have a page_deleted field that can be set to archived. Then the question is what should happen when a page with the same name is created

Presenting the "title" table, as a foreign key of page (which will also solve numerous issues with *links tables). Title is the equivalent of the normalization of comment. Although needs more thought.

Title is the equivalent of the normalization of comment

That's certainly possible, but I don't see the point. The archive table is the only one that repeats the same title over and over. If we get rid of that, a title can only exists one per namespace in the page table. If we have a page_archive table, it can also exist another time per deletion of a page. If we go for page_deleted, the title can only exist once per namespace, and would typically only exist twice (for the page and corresponding talk page).

The archive table is the only one that repeats the same title over and over.

Please, please have a look at the pagelinks, templatelinks and categorylinks tables (we could reduce their size 10x)

If we get rid of that, a title can only exists one per namespace in the page table.

Why? We can have the same comment for several revisions (millions of times). We can have the same title for several pages. Title is a combination of namespace + text. You just have page (page_id, namespace, title, deleted) VALUES (1, 0, 36, 1), (2, 0, 36, 0), (3, 1, 37, 0) while title being (title_id, namespace, title) VALUES (36, 0, 'The adventures of Tom Sawyer' -- this would be the url /wiki/The_adventures_of_Tom_Sawyer),(37, 1, 'The adventures of Tom Sawyer' -- this would be the url /wiki/Talk:The_adventures_of_Tom_Sawyer). The details are not that important. Instead of "deleted" you could have a "page_version", a monotonically increasing value of pages, so you don't need to update the ones that are deleted, only get the latest one. Those are only options, we need to see the performance impact and which operations we want to favor over others.

Basically, the idea is that title and page are 2 entities that happen to be related, but one is a set of coherent text with revisions and history, and the other is an alias, which is shared by several pages as they are renamed, deleted and recreated.

The archive table is the only one that repeats the same title over and over.

Please, please have a look at the pagelinks, templatelinks and categorylinks tables (we could reduce their size 10x)

Yes, for that having normalized titles would make a lot of sense. And I'm all for doing that, but it doesn't seem to be relevant here. Except perhaps in that we could add page_title_id at the same time as adding page_deleted, if we go for that option.

If we get rid of that, a title can only exists one per namespace in the page table.

Why? We can have the same comment for several revisions (millions of times). We can have the same title for several pages. Title is a combination of namespace + text. You just have page (page_id, namespace, title, deleted) VALUES (1, 0, 36, 1), (2, 0, 36, 0), (3, 1, 37, 0) while title being (title_id, namespace, title) VALUES (36, 0, 'The adventures of Tom Sawyer' -- this would be the url /wiki/The_adventures_of_Tom_Sawyer),(37, 1, 'The adventures of Tom Sawyer' -- this would be the url /wiki/Talk:The_adventures_of_Tom_Sawyer).

Yes, the same title-text can occur once per namespace, as I said. The same title (namespace+text) can occur only once in the page table, it's a unique key.

The details are not that important. Instead of "deleted" you could have a "page_version", a monotonically increasing value of pages, so you don't need to update the ones that are deleted, only get the latest one. Those are only options, we need to see the performance impact and which operations we want to favor over others.

With page_version we'd always have to find the "newest" entry for a title in the page table, which is nasty in joins. And the page table would become much larger. And listing pages would become much more expensive. I don't think that's a good idea.

We are updating the page row for every edit anyway, to write the new value of page_latest. Updating page_deleted at the same time seems unproblematic.

Basically, the idea is that title and page are 2 entities that happen to be related, but one is a set of coherent text with revisions and history, and the other is an alias, which is shared by several pages as they are renamed, deleted and recreated.

Treating the title as an alias that can be re-assigned follows from both options I presented. For that, it does not matter whether the title is normalized or not. The key here is that deleted revisions stay bound to the page ID, while presently, they stay bound to the page title. This is a change in behavior that will break some existing workflows, and would need alternatives to be implemented.

The idea that a title can refer to multiple pages at once (one non-deleted, and multiple deleted) is what the page_archive option achieves.

Reminder: TechCom is hosting a Public IRC Discussion of this RFC on 2018-07-18 in the #wikimedia-office channel at 2pm PST(22:00 UTC, 23:00 CET)

Meeting minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-07-18-21.00.html

There is the question of how Special:Contributions and Special:DeletedContributions will work. Krinkle believes the community will require feature parity, i.e. the ability to view only deleted contributions, and the ability to view only non-deleted contributions. It should be feasible to add a new merged mode which shows both types of contributions sorted by timestamp. One possible query plan is to store the proposed "archived" flag in a separate boolean field rev_archived, then have only a (rev_user,rev_archived,rev_timestamp) index, and then implement the merged mode using "rev_archived IN (0,1)".

We could have two contributions indexes, (rev_user, rev_timestamp) and (rev_user, rev_archived, rev_timestamp), this is CPU efficient but requires more memory and disk space. Or we could have only (rev_user,rev_timestamp), this is memory efficient but the Special:DeletedContributions replacement would require a lot of table scanning.

There is the question of what happens to the page table. A page_deleted field would require a non-unique index (page_deleted,page_title_id).

An improvement on the current "delete/selective undelete" workflow would be to provide a selective deletion feature as a kind of history splitting. The user would select the revisions to be archived, and then a new page row would be created for those revisions, and the revisions selected for archiving would be moved into the new page. That way, there would be no need to include rev_archived in the action=history index, it would implicitly be in rev_page. The new page could be moved and then undeleted under some other title, providing a full history splitting workflow.

For feature parity, a history merge feature needs to be provided. There is the question of whether to allow undeletion of an archived page when a non-deleted page has the same title, should this cause an implicit history merge? Or should this use case be handled entirely with Special:MergeHistory?

JJMC89 added a subscriber: JJMC89.Jul 19 2018, 12:43 AM

There is the question of whether to allow undeletion of an archived page when a non-deleted page has the same title, should this cause an implicit history merge? Or should this use case be handled entirely with Special:MergeHistory?

Not allowing undeletion of deleted revisions (an archived page) would just add extra steps.

  1. Delete the undeleted revisions (the existing page)
  2. Undelete everything

Currently, does it cause an implicit history merge?

Special:MergeHistory (mergehistory right) is not guaranteed to be available on all wikis.

Not allowing undeletion of deleted revisions (an archived page) would just add extra steps.

  1. Delete the undeleted revisions (the existing page)
  2. Undelete everything

I'm imagining that undeletion would be done on a page granularity. So if you delete a page, and then it is recreated with the same title, then deleted again, that would make two deleted pages, and you would not be able to undelete both in the same operation, you would have to merge their histories first.

Currently, does it cause an implicit history merge?

Yes, undelete currently causes an implicit history merge, this implementation accident has been used as a poor-man's history merge tool since its inception.

Special:MergeHistory (mergehistory right) is not guaranteed to be available on all wikis.

I can fix that in like one minute.

I'm imagining that undeletion would be done on a page granularity. So if you delete a page, and then it is recreated with the same title, then deleted again, that would make two deleted pages, and you would not be able to undelete both in the same operation, you would have to merge their histories first.
Yes, undelete currently causes an implicit history merge, this implementation accident has been used as a poor-man's history merge tool since its inception.

Given that enwiki's most prolific history merger, @Anthony_Appleyard, has hardly used Special:MergeHistory (1), I question whether or not the tool is sufficient for editor's needs.

Special:MergeHistory (mergehistory right) is not guaranteed to be available on all wikis.

I can fix that in like one minute.

The Wikimedia cluster might be fine in this regard. (I haven't checked.) I was referring to other installations. I am a sysop on one where no groups have mergehistory, so I have to use the poor-man's version.

@JJMC89 It is part of the sysop grant by default in the MediaWiki software.

DefaultSettings.php
$wgGroupPermissions['sysop']['mergehistory'] = true;

I'm not sure, but do you mean that a third-party wiki has given you sysop but explicitly taken out the mergehistory right from said group? If so, that seems odd, given that, as you say, you can still do it via delete/undelete. Perhaps they disabled it by accident?

JJMC89 added a comment.EditedJul 20 2018, 4:53 AM

@JJMC89 It is part of the sysop grant by default in the MediaWiki software.

I know.

I'm not sure, but do you mean that a third-party wiki has given you sysop but explicitly taken out the mergehistory right from said group? If so, that seems odd, given that, as you say, you can still do it via delete/undelete. Perhaps they disabled it by accident?

I don't have access to the configs, only what I can see on Special:ListGroupRights (no mergehistory). The wiki has been around since before mergehistory existed. Could that be it? (I don't manage any installs, so I'm ignorant on the impact of updates here.)

Given what @tstarling wrote in T20493#4436799, would I be able to do it with delete/undelete?

I'm imagining that undeletion would be done on a page granularity. So if you delete a page, and then it is recreated with the same title, then deleted again, that would make two deleted pages, and you would not be able to undelete both in the same operation, you would have to merge their histories first.
Yes, undelete currently causes an implicit history merge, this implementation accident has been used as a poor-man's history merge tool since its inception.

Given that enwiki's most prolific history merger, @Anthony_Appleyard, has hardly used Special:MergeHistory (1), I question whether or not the tool is sufficient for editor's needs.

I can't speak to anyone else, but at least for me, I have been unable to use Special:MergeHistory because its documentation is completely useless. In the face of this, undeletion is the only realistic method for performing history merges.

(1) One long-term fault is that an admin can't delete some edits of a page, but he must delete all the edits and then undelete the edits that he wants to stay undeleted. That wastes his time and Wikipedia's system time. This need arises if he is history-merging X (older page) to Y (newer page), and first he must lose from the end of X any stray late edits (e.g. redirects and BattyBot edits) made after the cut-and-paste event. This process is liable to accidents if the page already has deleted edits at the start of this process.
(2) Currently, moving a page only moves the undeleted (visible) edits. It would be useful if it was also possible to move only the deleted edits, when fishing deleted edits out from under visible edits , to prevent the sort of accident described at the end of (1).

I think there is still not 100% clear agreement on the abstraction "what is a page" "what is a revision" "what does a deleted revision belong to" "move a page with deleted revisions"-while those question cannot be answered without having into account the limitations imposed by reality, those are questions that should be answered first with very detailed "use cases" before proposing a specific implementation. Let's document the non-trivial workflow of what should be possible first, and only later the storage model. Let's have into account readers, wiki admins and researchers (among many others) reconstructing the history of a page, which also get impacted by the inconsistencies of the current model.

The problems with the current undeletion interface will be solved with T193690, which also deals with fixing problems with rev_parent_id and ar_parent_id fields. The same page ID will never be used for more than one deleted page title, nor for both a deleted and an existing page. Also, the ar_namespace, ar_title, and ar_page_id fields will all be moved to a new pagearchive table as pa_namespace, pa_title, and pa_page_id; and the archive table will get a new ar_pa_id column.

Also, in the context of T193690, we could add a "SplitHistory" class containing a function that does the following for a given title A, a given deleted page ID n (with pa_id p), and a given cut-off revision ID r:

  1. Add a new row to the page table with title A and page_id m, and immediately delete it.
  2. Add a new row to the pagearchive table with q for the pa_id field and m for the pa_page_id field.
  3. Change the ar_parent_id field for the row in the archive table with ar_rev_id r to zero (this must be done because parent IDs are now preserved on undeletions as of MW 1.31).
  4. Change the ar_pa_id field for revision ID r and all later deleted revisions with ar_pa_id p to q.
  5. Update the pa_rev_count fields (used to display the number of restored revisions in the log entry) for rows p and q in the pagearchive table.
  6. Generate a log entry for the history split.

Then, Special:SplitHistory would do the following for a given page A with ID n and a given cut-off revision ID r:

  1. Delete page A.
  2. Apply the steps above for page ID n and revision ID r.
  3. Undelete page A with either the original page ID n, or the new page ID m, depending on whether the user chooses to move earlier or later revisions.
  4. Move A to another title B without redirect.
  5. Undelete page A with the other page ID (n if page B has ID m; m if page B has ID n).

Finally, history merges would be done by using either Special:MergeHistory or Special:MergeAndMove.

Sub tasks are for tasks that represent required parts of a larger task. The RFC for unification of rev-delete and page-archive is expected to come to its own conclusion, and not blocked on T193690. If you think T193690 represents a subset of the problem and that it would be obsoleted by the unification, then we could merge the task instead. Or, if you mean that the unification should be done first, set it as parent instead of sub task, or if related only, use a textual reference in the task's description.

dbarratt updated the task description. (Show Details)Aug 30 2018, 5:37 PM

As this is touching many questions about history merging and splitting, I think there is some connection to T113004 here. especially concerning restructuring the database for pages and revisions.

Rxy added a subscriber: Rxy.Oct 17 2018, 2:25 AM
Halfak added a subscriber: Halfak.Jan 4 2019, 10:04 PM
Lofhi added a subscriber: Lofhi.Jan 6 2019, 6:48 PM

...
Then, Special:SplitHistory would do the following for a given page A with ID n and a given cut-off revision ID r:

  1. Delete page A.
  2. Apply the steps above for page ID n and revision ID r.
  3. Undelete page A with either the original page ID n, or the new page ID m, depending on whether the user chooses to move earlier or later revisions.
  4. Move A to another title B without redirect.
  5. Undelete page A with the other page ID (n if page B has ID m; m if page B has ID n).

This seems to rely on there being no pre-existing deleted edits for ID n or ID m .

Scott added a subscriber: Scott.Mar 21 2019, 12:11 PM

Has any of the people that are in charge of producing each version of MediaWiki ever considered creating an extension that allows administrators to select revisions to delete through the page deletion system? This idea would obviously be different from Manual:RevisionDelete which deletes revisions through the revision deletion system, as opposed to the page deletion system.

I remember having this idea for years, and yet I can remember how surprised I was to find that there was no extension to my knowledge that allowed for revisions to be deleted through the page deletion system. Since there are specific sites that disallow pages in specific namespaces to be restored once deleted, unless the said users are in specific user-groups that aren't available to the general community.

The said extension would be pretty much identical to Manual:Page restoration only the exact opposite. As it would be added to the page deletion system and would allow for revisions to be deleted, instead of revisions to be restored.

Krinkle updated the task description. (Show Details)Jun 27 2019, 9:18 PM
Krinkle renamed this task from Unify various deletion systems to Unify the various deletion systems.Jul 24 2019, 8:58 PM
C.Syde65 added a comment.EditedJul 26 2019, 6:18 AM

Personally I think that an ability should be added to the traditional deletion system, allowing for selective deletions. So it would basically be the same as the traditional undeletion system, but it would use the same page as the traditional deletion system via ?action=delete.

Having an ability to selectively delete revisions through the traditional deletion system would save users the trouble of having to delete entire pages and then restore the wanted revisions.

From my experience, Revision delete is generally reserved for hiding sensitive information and other TOU breaking content that other users shouldn't be able to just stumble across. Unlike the more traditional deletion system that can just be used for whatever reason.

But then again, an issue with Revision delete is that it doesn't delete revisions from the page history. It just crosses them out. So I think a better alternative to this issue would be to make a separate user permission that would allow revisions to be deleted through the page history just like delete and partial undelete.

Another issue with Revision delete is that because it only reserved for sensitive or TOU breaking content on some sites, some sites don't allow general access to the Revision delete system. So users with nothing more than general access are limited to just the traditional deletion system.

And a serious con that I've noticed is that there are few sites that have namespaces where undeletion isn't possible, and therefore deletion and partial undeletion would be out of the question. If there was a separate ability that would allow revision deletions through the page deletion system, then that would solve that problem.

daniel moved this task from Under discussion to Backlog on the TechCom-RFC board.Jul 31 2019, 5:23 AM

Putting this into the RFC backlog, pending product level input from Core Platform Team or Growth-Team.

Krinkle updated the task description. (Show Details)Aug 1 2019, 1:16 PM
Krinkle added a comment.EditedAug 1 2019, 1:22 PM

@C.Syde65 Thanks. I've added the use case of "allow selective hiding of revisions" to the requirements section. This was unintentionally omitted by me due to my bias toward using the technical internals of "Revision delete" as the basis for the new unified system (which naturally has this ability already). I've added it now to make it more explicitly.

As for how it would look for end-users, that is what this task is about. I think from a technical perspective, using the "traditional delete system" for anything, will not be an option long-term because of the very drastic performance and availability risks it has. It's time to let that go. However, understand that I'm referring to its technical internals - I'm not talking about the user interface of traditional page deletion, and not talking about the impact on "View history".

One of the open questions here is whether we need the ability for revisions to be entirely omitted from the history page pagination (which is currently possible by deleting the whole page and selectively undeleting all-but-one revision). My "Proposal 1" currently suggests that we do keep this ability, and that it would become one of the options of "Revision delete" - just like how we have several options already about visibility of user name, timestamp and content. For the user-interface, this could look like a checkbox on "Special:RevisionDelete", or it could look like "Special:Undelete" - that's a separate question.

One of the open questions here is whether we need the ability for revisions to be entirely omitted from the history page pagination (which is currently possible by deleting the whole page and selectively undeleting all-but-one revision). My "Proposal 1" currently suggests that we do keep this ability, and that it would become one of the options of "Revision delete" - just like how we have several options already about visibility of user name, timestamp and content. Another could be to omit the entry entirely. For the user-interface, this could look like a checkbox on "Special:RevisionDelete", or it could look like "Special:Undelete" - that's a separate question.

I think we absolutely do need the ability to hide revisions from pagination in the history (and therefore hide them from the count of total revisions). Sometimes people move articles by cut-and-paste over redirects (instead of using the page move button), and it's regular practice when fixing these cut-and-paste moves to completely obliterate these redirect revisions from the history. See:
https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_History_Merge#Special:MergeHistory_links_for_rapid-processing_of_predictable_bot-cut-pastes

Also see the logs here for a particularly painful case, which is made even worse by T45911:
https://en.wikipedia.org/w/index.php?title=Special:Log/Graham87&offset=20190725114337&limit=13&type=&user=Graham87

@Krinkle So for users with the (delete) permission. Would that automatically give them access to (deleterevision) or would that still require a separate permission? I'm asking because not every site would be chuffed with allowing all users with the ability to delete and undelete pages to be able to revision delete them as well. I have to admit that (deleterevision) doesn't really look user friendly, given that it just crosses the revisions out preventing them from being viewable, whereas if a similar ability was part of the traditional deletion system, it would delete the revisions the same way it deletes entire pages. I've had reasons to delete selective revisions rather than having to delete entire pages and restore the wanted revisions, especially since there are some pages in some namespaces on certain sites that don't allow undeletions. So naturally, not having an ability to selectively delete revisions through the traditional deletion system is quite frustrating. And I've always said to myself "If you can restore selective revisions through the traditional undeletion system, why can't you delete selective revisions through the traditional deletion system." Therefore it would balance out the deletion and undeletion systems, giving them the same number of options. And another thing is that entire pages cannot be deleted through (deleterevision) as the most recent revision cannot be partially deleted. Since deleting selective revisions through the traditional deletion system would only allow full deletions on each revision, it wouldn't be limited to working on earlier revisions, unlike the revision deletion system.

Izno added a subscriber: Izno.Sep 1 2019, 3:55 PM