Page MenuHomePhabricator

Run reverted tag update job only after the edit is approved
Closed, ResolvedPublic

Description

This is a follow-up on T259014: Protect the reverted edits feature from abuse about mitigation #4 (waiting with reverted tag update job until after the edit was approved / reviewed / patrolled).

Design

  • When DerivedPageUpdater is about to schedule RevertedTagUpdateJob, run a hook asking extensions if they veto it.
    • Patrol code should stop the job for non-autopatrolled edits as well.
  • If something stopped the update, it is somehow persisted for later (possible) use.
  • Once the edit is approved / reviewed / patrolled, that review code should retrieve the persisted job (or data that is sufficient to recreate the job) and schedule it.
  • If the edit is never approved, the update won't be carried out.

This should work in particular with built-in core patrolling, FlaggedRevs and Approved Revs. I haven't looked into Moderation yet, it may or may not have to use this mechanism.

Persistence

RevertedTagUpdateJob needs only two things: the ID of the revision that was the revert and its associated EditResult. The first one is trivial, but EditResult is not persisted in any way and that is a problem, because reconstructing it later based on data in the DB is currently impossible.

We are looking for something that should be able to store non-critical data over extended periods of time, say a year at most, that should be enough for wikipedians to catch up with reviewing pages :P The storage doesn't have to be structured, a blob will suffice (we can use ser/des).

I came up with a few options for storing it for later use, all of them are just different tastes of "bad".

  1. Additional fields in the revision table or a new database table entirely. That would be really nice, but it does seem like a huge overkill for something like this. Also: very complicated and can break a lot of things. Probably a bad idea.
  2. Add job_paused field to job table to indicate the job should not be executed for now. That would also require a schema change and break a lot of code. It would also spam the job table with thousands of jobs that should not be executed, so… yeah, it would be a mess.
  3. Store it inside revert change tags in change_tag table, field ct_params. We would just put a serialized EditResult in that field for mw-undo, mw-rollback and mw-manual-revert change tags. This would indeed work, but only if these tags are enabled on the wiki in the first place. We can't assume that, sadly.
  4. Use the main object stash. Citing Manual:Caching: This store is expected to have strong persistence and is often used for data taht cannot be regenerated and is not stored elsewhere. However the data stored here must be non-critical and result in minimal user impact, thus allowing for the backend to sometimes be partially unavailable or wiped if under operational pressure without causing incidents. That sounds like what we need here. The expiry can be set to something really high (like a year) and configurable.

I don't have any other (even remotely) sensible ideas for now.

My proposal

Use the main object stash and wrap it in a service (something like EditResultCache) that would allow for easy stashing and retrieval of EditResults later. Optionally we can combine this with approach #3 and store EditResults in revert tags as well. In case the main object stash somehow loses the EditResult, we can always try to retrieve it from the change_tag table.

Conclusion

This is admittedly a bit messy and I'm not sure if this feature is the right way to go. I would personally go with it, but any opinions on whether this solution should be pursued or not would be appreciated. :)

Event Timeline

How would this work on wikis where $wgUseRCPatrol = false;? On e.g. enwiki, edit patrolling isn't used, and FlaggedRevs is only used on some pages, so most edits are never patrolled/reviewed (and can't be)

Other complications:

  • Edits that perform a false undo would not look suspicious to a patroller, and would be marked as patrolled. We could expose this information to patrollers (or everyone) in the UI (this edit claims to undo revisions X, Y and Z), but even then, patrollers likely won't be inclined to reject edits with bad undo metadata if the diff is harmless
  • We'd have to somehow retain/save the "revision A claims to undo revisions X, Y and Z" metadata; this could go in ct_params maybe
  • Even if an edit is obviously bad and reverts a bunch of edits that shouldn't be reverted, the typical practice is for the patroller to revert the bad edit or otherwise fix up the state of the page, then approve all the edits (both the bad edits and the subsequent fixup edits). There's not really a way to reject an edit, you can only say "this edit was bad, but I'm approving it now because I've fixed the damage that it did"

How would this work on wikis where $wgUseRCPatrol = false;? On e.g. enwiki, edit patrolling isn't used, and FlaggedRevs is only used on some pages, so most edits are never patrolled/reviewed (and can't be)

Well, if no mechanism of reviewing edits is in place, the reverted tag would be applied right away. I guess that if a wiki decides not to review changes some pages, that is their choice and probably means they are not much afraid of abuse there.

  • Edits that perform a false undo would not look suspicious to a patroller, and would be marked as patrolled. We could expose this information to patrollers (or everyone) in the UI (this edit claims to undo revisions X, Y and Z), but even then, patrollers likely won't be inclined to reject edits with bad undo metadata if the diff is harmless
  • We'd have to somehow retain/save the "revision A claims to undo revisions X, Y and Z" metadata; this could go in ct_params maybe

If we apply the patch #3 described in T259014: Protect the reverted edits feature from abuse, false undos would be eliminated completely, so this wouldn't be a problem. I think just not marking these edits as undos would be much simpler and more reliable. No harm is done by marking something a user claims an undo like a regular edit.

  • Even if an edit is obviously bad and reverts a bunch of edits that shouldn't be reverted, the typical practice is for the patroller to revert the bad edit or otherwise fix up the state of the page, then approve all the edits (both the bad edits and the subsequent fixup edits). There's not really a way to reject an edit, you can only say "this edit was bad, but I'm approving it now because I've fixed the damage that it did"

Hmm, that is a good one. Rollback will set "autopatrolled" patrolmarks on reverted edits, but this can't be relied upon in other forms of edit review and when using other revert methods. We could for example run the reverted tag update job only if the reverting edit itself doesn't have the reverted tag, this would cover most cases like this. The job queue is a queue after all, so this should work…? I don't have a better idea for now.

I haven't looked into Moderation yet, it may or may not have to use this mechanism.

I think it can be safely ignored.

Due to how Moderation is normally used (only new users are moderated, and once they do some useful edits, it's recommended by Best Practices that admin adds them to "automoderated" group, which allows them to bypass Moderation completely),
only the very new users will be affected (their reverts won't be marked as "reverted") if this is not implemented.
New users are probably not going to revert anything anyway, so the impact is minimal.

Change 618784 had a related patch set uploaded (by Ostrzyciel; owner: Ostrzyciel):
[mediawiki/core@master] Implement EditResultCache

https://gerrit.wikimedia.org/r/618784

Change 618784 merged by jenkins-bot:
[mediawiki/core@master] Implement EditResultCache

https://gerrit.wikimedia.org/r/618784

FlaggedRevs implements this, but I would like to at least implement this in Approved Revs too before closing this task.