Page MenuHomePhabricator

Allow aspects of edits or logs to be revdeled while other aspects are supressed
Open, MediumPublic

Description

I searched and didn't quite find anything like this. The problem is that when you have an abusive edit that is removed entirely from the database by an oversight/office action, it can leave behind an abusive edit summary and a standard admin with revdel capabilities cannot remove this. See the attachment.

Possibly relevant:

Event Timeline

Koavf created this task.May 23 2020, 9:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 23 2020, 9:47 PM
jrbs added a subscriber: jrbs.May 24 2020, 12:14 AM
Peachey88 updated the task description. (Show Details)May 24 2020, 7:51 AM
Koavf updated the task description. (Show Details)May 24 2020, 8:23 AM
Izno updated the task description. (Show Details)May 24 2020, 1:58 PM
Huji added a comment.Jul 6 2020, 12:32 PM

Copying from the merged-in duplicate:

Having taken a quick look at this, seems to be due to the $item->canView() checks failing (eg at SpecialRevisionDelete::showForm, line 420). That function is in RevDelRevisionItem, line 81:

return RevisionRecord::userCanBitfield(
	$this->getRevisionRecord()->getVisibility(),
	RevisionRecord::DELETED_RESTRICTED,
	$this->list->getUser()
);

Would probably have to do a couple of bit checks, but seems do-able - the various different fields suppressed result in a different bit value. Perhaps in a duplicate function, since canView can't take a parameter.

MER-C added a subscriber: MER-C.Jul 22 2020, 4:54 PM

https://en.wikipedia.org/w/index.php?date-range-to=2019-04-04&tagfilter=&title=Khaled_Juffali&action=history

Similarly I want to zap the content of those revisions with the oversighted username for copyvio, but can't.

Huji added a comment.EditedOct 19 2020, 2:24 AM

Finally had a chance to circle back to this. Unlike what @Proc had hoped, this will take a substantial change in code. The reason becomes clear when you see mw:Bitfields_for_rev_deleted#Current_values: we use the same bits to store the deletion and the suppression flag for each of the revision's three components (text, summary, user). With our current approach, it is not possible to configure a revision in such a way that its text is suppressed and its summary is deleted (but not suppressed).

To support concurrent use of rev-del and supression, we will need to change the revision deletion constants. One strategy is to use 1, 2 and 4 like we do now, but instead of adding 8 to them to indicate suppression, we could multiply the 1/2/4 bits by 100. The table below shows how the revision visibility bit will be calculated under the existing and proposed method. This proposal has a few advantages:

  • Decouples the rev-del status and the suppression status (main purpose)
  • Code will remain backward compliant; that is, the only values reused will be those which retain the same meaning
  • There is no need for a maintenance script to update all revisions in the database. Every time a change is made the revision, the new value will be calculated using the new representation
  • The math necessary to interpret these is not substantially more complex (or computationally more expensive); our current approach uses binary math; the proposed approach will be exactly identical for revisions that are not suppressed. For those that are suppressed, it will first separate the left-most digit from the rightmost digit (a simple division and modulo operation) and then uses the same old logic on each of those.

It has one key disadvantage:

  • The rev_deleted field in the revision table is an 8-bit TINYINT field, which means it only supports values between 0 and 255. The proposed approach requires storage space for numbers as big as 700. We would need to ALTER the revision table, which would be an expensive Schema-change

We could use a more sophisticated algorithm that restricts the values to be between 0 and 255, but it would not be as clean. For instance, we can use a small multiplier of 16. This will also guarantee backward compliance, but the math to interpret it will be more involved.

I am happy to write the patch for it, but rather wait for some feedback first.

textsummaryusercurrent representationproposed representation
deleted11
deleted22
deleteddeleted33
deleted44
deleteddeleted55
deleteddeleted66
deleteddeleteddeleted77
suppressed9100
suppressed10200
suppressedsuppressed11300
suppressed12400
suppressedsuppressed13500
suppressedsuppressed14600
suppressedsuppressedsuppressed15700
suppresseddeletedN/A102
suppresseddeletedN/A104
suppresseddeleteddeletedN/A106
deletedsuppressedN/A201
suppresseddeletedN/A204
deletedsuppresseddeletedN/A205
suppressedsuppresseddeletedN/A304
deletedsuppressedN/A401
deletedsuppressedN/A402
deleteddeletedsuppressedN/A403
suppresseddeletedsuppressedN/A502
deletedsuppressedsuppressedN/A601

I am adding three people here, whom I would like to ask to opine on my last comment right above. Daniel is added because it seems like he was the one who created RevisionRecord and would know a lot about it. Max because he was among the last people to touch the code where RevisionRecord deletion constants are defined. Sam because he has always been extremely helpful to me and is generally a thoughtful developer.

TL;DR: should we revised our deletion constants to allow for coexistence of rev-del and suppression, and if yes, what strategy do you like better?

Might I suggest instead

textsummaryusercurrent representationproposed representation
deleted1 (00000001)1 (00000001)
deleted2 (00000010)2 (00000010)
deleteddeleted3 (00000011)3 (00000011)
deleted4 (00000100)4 (00000100)
deleteddeleted5 (00000101)5 (00000101)
deleteddeleted6 (00000110)6 (00000110)
deleteddeleteddeleted7 (00000111)7 (00000111)
suppressed9 (00001001)17 (00010001)
suppressed10 (00001010)34 (00100010)
suppressedsuppressed11 (00001011)51 (00110011)
suppressed12 (00001100 )68 (01000100)
suppressedsuppressed13 (00001101)85 (01010101)
suppressedsuppressed14 (00001110)102 (01100110)
suppressedsuppressedsuppressed15 (00001111)119 (01110111)
suppresseddeletedN/A19 (00010011)
suppresseddeletedN/A21 (00010101)
suppresseddeleteddeletedN/A23 (00010111)
deletedsuppressedN/A35 (00100011)
suppresseddeletedN/A38 (00100100)
deletedsuppresseddeletedN/A39 (00100101)
suppressedsuppresseddeletedN/A55 (00110111)
deletedsuppressedN/A69 (01000101)
deletedsuppressedN/A70 (01000110)
deleteddeletedsuppressedN/A71 (01000111)
suppresseddeletedsuppressedN/A87 (01010111)
deletedsuppressedsuppressedN/A103 (01100111)

Bitwise this results in the 8 bits:

currently
<unused><unused><unused><unused>	<suppress rather than delete><user><comment><text>
proposed
<unused><suppress user><suppress comment><suppress text>	<mark legacy format><delete user><delete comment><delete text>

where the <suppress user/comment/text> bits only do anything if the <delete *> bit is activated. If the <delete user> bit is 1, and the <suppress user> bit is 0, user is visible to admins, but if the <delete user> bit is 1, and the <suppress user> bit is 1, it is visible only to oversighters.
This does not require a schema change, and should be completely backwards compatible since if the <mark legacy format> bit is true in the new definition, the old rules are used, and if it is false it doesn't matter because it is not a part of the new definition

Huji added a comment.Oct 21 2020, 1:15 PM

I love that! Let me see if I can make a patch out of this one.

Huji claimed this task.Oct 22 2020, 1:24 PM
Huji triaged this task as Medium priority.
Huji added a project: User-Huji.

This will probably also need a maintenance script, and it might be worth tagging DBA to get feedback on the change before proceeding

Huji added a comment.Oct 22 2020, 5:09 PM

My plan was to bring it to DBA attention once we have a patch. The proposal will not result in a schema change.

I agree regarding the maintenance script. Revisions that are deleted/suppressed are rarely touched again, in terms of their rev-del/suppress status. So while it is true that the code could update old entries to their new value if they were ever updated, because updates are rare, it is best to have a maintenance script.

However, the decision as to whether and when to run the script for WMF deserves a separate task. One could argue that because code will remain backward compliant, we might as well never update the revision table. This won't be the first time we have old and new data using different data models in the same DB field. In logging we have changed how some log properties are recorded over time. In AbuseFilter, we have similar examples too.

Huji removed Huji as the assignee of this task.Oct 24 2020, 1:43 AM
Further Analysis

As it turns out a lot of core and non-core code is directly interpreting the value of rev_deleted. This means we need to patch everyone of those.

In use cases that are part of MW or an extension, a bit-wise AND is created using Database::bitAnd() and with help from one of the revision deletion constants from RevisionRecord class, similar to this: $db->bitAnd( 'rev_deleted', RevisionRecord::DELETED_TEXT ) . ' = 0'. For these, we need a logic that checks for two possible values (old values and new values, e.g. 9 and 17). I don't think having repetitive code in so many files is a great idea. A better idea might be to refactor all of these such that they would use a new static method in Revision to create the desired condition. Something like RevisionRecord::getRevDelCond( RevisionRecord::SUPPRESSED_TEXT ) which would return the correct SQL statement (in this case, something like ( rev_deleted = 9 OR rev_deleted = 17)).

For use cases that are not part of MW, such as the puppets used by Cloud-Services, the comparisons are hard-coded (e.g. where: ipc_rev_id = rev_id AND (rev_deleted & 4) = 0. We would need to create patches for them in the same hard-coded way; that is, I cannot think of a way to refactor them into a modular way, like we can with MW code as explained above.

There are also a few cases that do not need to be changed, e.g. Extension:GlobalContribs uses the logic $condition[] = "rev_deleted != '0'"; which will stay the same after this task.

Next steps

The following subtasks need to be created and fulfilled, more or less in the order listed below:

  • Identify all use cases of rev_deleted in MediaWiki, MW Extensions, and non-MW code
  • Define new revision deletion constants called SUPPRESSED_TEXT, SUPPRESSED_COMMENT
  • Rename revision deletion constant SUPPRESSED_TEXT to SUPPRESSED_TEXT_OLD and update all code that uses it too
  • Rename revision deletion constant SUPPRESSED_COMMENT to SUPPRESSED_COMMENT_OLD and update all code that uses it too
  • Rename revision deletion constant SUPPRESSED_USER to SUPPRESSED_USER_OLD and update all code that uses it too
  • Define a new SUPRESSED_ACTOR constant whose value is different from the old SUPPRESSED_USER one (68 instead of 12). The word ACTOR is used to avoid confusion with the old constant, and to align with the current column names in the revision table
  • Create a new DELETED_ACTOR constant with the same value as the DELETED_USER; indicate that DELETED_USER is only kept for historical reasons
  • Create a static method in RevisionRecord class that can return the SQL statement corresponding to each rev-del status check
  • Update all MW/Extension code that uses rev_deleted to use the new method for its SQL queries
  • Update all non-MW code such that they can interpret the new rev_deleted values correctly
  • Modify RevisionRecord::userCanBitfield so it can understand the new rev_deleted values correctly
  • Update all other references to the revision deletion constants (e.g. in API code, HistoryPager, LogPager, etc.) that do not explicitly produce a SQL query, to use the new constants

Most or all of the above need to also be repeated for logs (in includes/logging/LogPage.php a similar set of constants are defined).

By the end of all of these, we still won't have any of the new rev_deleted values in the database. We just have made sure that once we turn on the switch, all code that depends on them will continue to function as before.

Then, and only then, we can finally updated SpecialRevisionDelete to start writing the new values into the database.

At this point, I don't think this is a one-man job anymore. Certainly above my bandwidth and availability. I am unassigning this task from myself.

Huji raised the priority of this task from Medium to Needs Triage.Oct 24 2020, 1:43 AM

Fields using the same bit logic (as far as I can tell):
rev_deleted
ar_deleted
oi_deleted
fa_deleted
rc_deleted
log_deleted

I suggest adding a service for the migration, like for ActorMigration, but that handles setting and reading the values for the specific fields. Tagging CPT as a feature request to review, and my task workboard since I might be able to work on this (assuming this is approved to go forward; double checking first since this is a larger scope that previously expected)

DannyS712 moved this task from Unsorted to Later on the User-DannyS712 board.Oct 24 2020, 2:15 AM
Huji renamed this task from Allow revdel for the edit summaries of oversight-style deleted diffs to Allow aspects of edits or logs to be revdeled while other aspects are supressed.Oct 24 2020, 12:52 PM

Thanks Danny. I will be happy to provide support through code review, at least.
I retitled the task to reflect its new scope

AMooney triaged this task as Medium priority.Tue, Nov 10, 9:37 PM
AMooney moved this task from Inbox to Feature Requests to Review on the Platform Engineering board.