Run reverted tag update job only after the edit is approved
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Ostrzyciel
	Jul 29 2020, 8:49 AM

Description

This is a follow-up on T259014: Protect the reverted edits feature from abuse about mitigation #4 (waiting with reverted tag update job until after the edit was approved / reviewed / patrolled).

Design

When DerivedPageUpdater is about to schedule RevertedTagUpdateJob, run a hook asking extensions if they veto it.
- Patrol code should stop the job for non-autopatrolled edits as well.
If something stopped the update, it is somehow persisted for later (possible) use.
Once the edit is approved / reviewed / patrolled, that review code should retrieve the persisted job (or data that is sufficient to recreate the job) and schedule it.
If the edit is never approved, the update won't be carried out.

This should work in particular with built-in core patrolling, FlaggedRevs and Approved Revs. I haven't looked into Moderation yet, it may or may not have to use this mechanism.

Persistence

RevertedTagUpdateJob needs only two things: the ID of the revision that was the revert and its associated EditResult. The first one is trivial, but EditResult is not persisted in any way and that is a problem, because reconstructing it later based on data in the DB is currently impossible.

We are looking for something that should be able to store non-critical data over extended periods of time, say a year at most, that should be enough for wikipedians to catch up with reviewing pages :P The storage doesn't have to be structured, a blob will suffice (we can use ser/des).

I came up with a few options for storing it for later use, all of them are just different tastes of "bad".

Additional fields in the revision table or a new database table entirely. That would be really nice, but it does seem like a huge overkill for something like this. Also: very complicated and can break a lot of things. Probably a bad idea.
Add job_paused field to job table to indicate the job should not be executed for now. That would also require a schema change and break a lot of code. It would also spam the job table with thousands of jobs that should not be executed, so… yeah, it would be a mess.
Store it inside revert change tags in change_tag table, field ct_params. We would just put a serialized EditResult in that field for mw-undo, mw-rollback and mw-manual-revert change tags. This would indeed work, but only if these tags are enabled on the wiki in the first place. We can't assume that, sadly.
Use the main object stash. Citing Manual:Caching: This store is expected to have strong persistence and is often used for data taht cannot be regenerated and is not stored elsewhere. However the data stored here must be non-critical and result in minimal user impact, thus allowing for the backend to sometimes be partially unavailable or wiped if under operational pressure without causing incidents. That sounds like what we need here. The expiry can be set to something really high (like a year) and configurable.

I don't have any other (even remotely) sensible ideas for now.

My proposal

Use the main object stash and wrap it in a service (something like EditResultCache) that would allow for easy stashing and retrieval of EditResults later. Optionally we can combine this with approach #3 and store EditResults in revert tags as well. In case the main object stash somehow loses the EditResult, we can always try to retrieve it from the change_tag table.

Conclusion

This is admittedly a bit messy and I'm not sure if this feature is the right way to go. I would personally go with it, but any opinions on whether this solution should be pursued or not would be appreciated. :)

Details

	Subject	Repo	Branch	Lines +/-
	Implement EditResultCache	mediawiki/core	master	+316 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	SBisson	T162675 'Integrated filters'—add all RC Page filters into the new UI along with selected new functions
Resolved	MMiller_WMF	T164307 Add Reverted filter to RecentChanges Filters
Stalled	None	T159725 AbuseFilter: Indicate that an edit was a revert
Declined	None	T155512 Trending service should track reverts more accurately
Resolved	Ostrzyciel	T56007 "reverted"-type notifications should be based on the actual page content
Resolved	Ostrzyciel	T152434 Add method to Revision to check if it was a Revert, and whether an edit was Reverted
Resolved	Ostrzyciel	T248775 Proposal: Add Reverted filter to RecentChanges Filters
Resolved	Ostrzyciel	T254074 Implement the reverted edit tag
Resolved	Ostrzyciel	T259014 Protect the reverted edits feature from abuse
Resolved	Ostrzyciel	T259103 Run reverted tag update job only after the edit is approved
Resolved	Ostrzyciel	T259733 PageUpdater: save additional info about reverts in ct_params
Resolved	Ostrzyciel	T260524 Implement BeforeRevertedTagUpdate hook in FlaggedRevs

Event Timeline

Ostrzyciel created this task.Jul 29 2020, 8:49 AM

How would this work on wikis where $wgUseRCPatrol = false;? On e.g. enwiki, edit patrolling isn't used, and FlaggedRevs is only used on some pages, so most edits are never patrolled/reviewed (and can't be)

Other complications:

Edits that perform a false undo would not look suspicious to a patroller, and would be marked as patrolled. We could expose this information to patrollers (or everyone) in the UI (this edit claims to undo revisions X, Y and Z), but even then, patrollers likely won't be inclined to reject edits with bad undo metadata if the diff is harmless
We'd have to somehow retain/save the "revision A claims to undo revisions X, Y and Z" metadata; this could go in ct_params maybe
Even if an edit is obviously bad and reverts a bunch of edits that shouldn't be reverted, the typical practice is for the patroller to revert the bad edit or otherwise fix up the state of the page, then approve all the edits (both the bad edits and the subsequent fixup edits). There's not really a way to reject an edit, you can only say "this edit was bad, but I'm approving it now because I've fixed the damage that it did"

In T259103#6347304, @Catrope wrote:

How would this work on wikis where $wgUseRCPatrol = false;? On e.g. enwiki, edit patrolling isn't used, and FlaggedRevs is only used on some pages, so most edits are never patrolled/reviewed (and can't be)

Well, if no mechanism of reviewing edits is in place, the reverted tag would be applied right away. I guess that if a wiki decides not to review changes some pages, that is their choice and probably means they are not much afraid of abuse there.

In T259103#6347306, @Catrope wrote:

Edits that perform a false undo would not look suspicious to a patroller, and would be marked as patrolled. We could expose this information to patrollers (or everyone) in the UI (this edit claims to undo revisions X, Y and Z), but even then, patrollers likely won't be inclined to reject edits with bad undo metadata if the diff is harmless

We'd have to somehow retain/save the "revision A claims to undo revisions X, Y and Z" metadata; this could go in ct_params maybe

If we apply the patch #3 described in T259014: Protect the reverted edits feature from abuse, false undos would be eliminated completely, so this wouldn't be a problem. I think just not marking these edits as undos would be much simpler and more reliable. No harm is done by marking something a user claims an undo like a regular edit.

Even if an edit is obviously bad and reverts a bunch of edits that shouldn't be reverted, the typical practice is for the patroller to revert the bad edit or otherwise fix up the state of the page, then approve all the edits (both the bad edits and the subsequent fixup edits). There's not really a way to reject an edit, you can only say "this edit was bad, but I'm approving it now because I've fixed the damage that it did"

Hmm, that is a good one. Rollback will set "autopatrolled" patrolmarks on reverted edits, but this can't be relied upon in other forms of edit review and when using other revert methods. We could for example run the reverted tag update job only if the reverting edit itself doesn't have the reverted tag, this would cover most cases like this. The job queue is a queue after all, so this should work…? I don't have a better idea for now.

I haven't looked into Moderation yet, it may or may not have to use this mechanism.

I think it can be safely ignored.

Due to how Moderation is normally used (only new users are moderated, and once they do some useful edits, it's recommended by Best Practices that admin adds them to "automoderated" group, which allows them to bypass Moderation completely),
only the very new users will be affected (their reverts won't be marked as "reverted") if this is not implemented.
New users are probably not going to revert anything anyway, so the impact is minimal.

Ostrzyciel mentioned this in T259733: PageUpdater: save additional info about reverts in ct_params.Aug 5 2020, 5:27 PM

Ostrzyciel added a subtask: T259733: PageUpdater: save additional info about reverts in ct_params.

Change 618784 had a related patch set uploaded (by Ostrzyciel; owner: Ostrzyciel):
[mediawiki/core@master] Implement EditResultCache

https://gerrit.wikimedia.org/r/618784

gerritbot added a project: Patch-For-Review.Aug 6 2020, 4:26 PM

Ostrzyciel closed subtask T259733: PageUpdater: save additional info about reverts in ct_params as Resolved.Aug 8 2020, 4:42 AM

Ostrzyciel mentioned this in T260524: Implement BeforeRevertedTagUpdate hook in FlaggedRevs.Aug 17 2020, 10:12 AM

Change 618784 merged by jenkins-bot:
[mediawiki/core@master] Implement EditResultCache

https://gerrit.wikimedia.org/r/618784

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.8; 2020-09-08).Sep 1 2020, 5:00 AM

Maintenance_bot removed a project: Patch-For-Review.Sep 1 2020, 5:10 AM

Ostrzyciel closed subtask T260524: Implement BeforeRevertedTagUpdate hook in FlaggedRevs as Resolved.Sep 14 2020, 11:51 AM