This is a follow-up on T259014: Protect the reverted edits feature from abuse about mitigation #4 (waiting with reverted tag update job until after the edit was approved / reviewed / patrolled).
Design
- When DerivedPageUpdater is about to schedule RevertedTagUpdateJob, run a hook asking extensions if they veto it.
- Patrol code should stop the job for non-autopatrolled edits as well.
- If something stopped the update, it is somehow persisted for later (possible) use.
- Once the edit is approved / reviewed / patrolled, that review code should retrieve the persisted job (or data that is sufficient to recreate the job) and schedule it.
- If the edit is never approved, the update won't be carried out.
This should work in particular with built-in core patrolling, FlaggedRevs and Approved Revs. I haven't looked into Moderation yet, it may or may not have to use this mechanism.
Persistence
RevertedTagUpdateJob needs only two things: the ID of the revision that was the revert and its associated EditResult. The first one is trivial, but EditResult is not persisted in any way and that is a problem, because reconstructing it later based on data in the DB is currently impossible.
We are looking for something that should be able to store non-critical data over extended periods of time, say a year at most, that should be enough for wikipedians to catch up with reviewing pages :P The storage doesn't have to be structured, a blob will suffice (we can use ser/des).
I came up with a few options for storing it for later use, all of them are just different tastes of "bad".
- Additional fields in the revision table or a new database table entirely. That would be really nice, but it does seem like a huge overkill for something like this. Also: very complicated and can break a lot of things. Probably a bad idea.
- Add job_paused field to job table to indicate the job should not be executed for now. That would also require a schema change and break a lot of code. It would also spam the job table with thousands of jobs that should not be executed, so… yeah, it would be a mess.
- Store it inside revert change tags in change_tag table, field ct_params. We would just put a serialized EditResult in that field for mw-undo, mw-rollback and mw-manual-revert change tags. This would indeed work, but only if these tags are enabled on the wiki in the first place. We can't assume that, sadly.
- Use the main object stash. Citing Manual:Caching: This store is expected to have strong persistence and is often used for data taht cannot be regenerated and is not stored elsewhere. However the data stored here must be non-critical and result in minimal user impact, thus allowing for the backend to sometimes be partially unavailable or wiped if under operational pressure without causing incidents. That sounds like what we need here. The expiry can be set to something really high (like a year) and configurable.
I don't have any other (even remotely) sensible ideas for now.
My proposal
Use the main object stash and wrap it in a service (something like EditResultCache) that would allow for easy stashing and retrieval of EditResults later. Optionally we can combine this with approach #3 and store EditResults in revert tags as well. In case the main object stash somehow loses the EditResult, we can always try to retrieve it from the change_tag table.
Conclusion
This is admittedly a bit messy and I'm not sure if this feature is the right way to go. I would personally go with it, but any opinions on whether this solution should be pursued or not would be appreciated. :)