After all core hooks have been updated, it should be easy to add a boolean variable that indicates whether an edit was reverted. So instead of checking summary for specific substrings, the code will simplify to &! is_revert (or so).
Description
Event Timeline
@matej_suchanek: good first task tasks are self-contained, non-controversial issues with a clear approach and should be well-described with pointers to help the new contributor. Given the current short task description I'm removing the good first task tag. Please re-add the tag once the task description has been polished and provides sufficient information for a new contributor. Thanks!
I'm reopening this, as significant progress has been made to make this possible.
PageUpdater now constructs an EditResult object when saving the new revision and makes it available through the PageSaveComplete hook. The object provides all necessary information to determine whether the edit was a revert and much more, there's tons of potential for AbuseFilter. The problem is – this hook is called after the revision is saved, and the call is made by PageUpdater, not EditPage, which belongs to a different abstraction layer. It may be possible to pass the EditResult before saving the revision (and thus allow for aborting it) using the MultiContentSave hook, but that would still be done from within PageUpdater.
My main problem here is that EditPage and WikiPage are a tangled mess and it's hard for me to propose something sensible. If EditPage controlled PageUpdater directly, without using WikiPage's doEditContent (which is deprecated BTW), it may be possible to split PageUpdater::saveRevision into two parts:
- Prepare the edit to be saved and construct the EditResult.
- Save the edit.
In such a setup we would be able to call AbuseFilter between these methods and pass EditResult to it properly. And from the right abstraction layer.
Idk, this is my best idea on how to solve this. Someone may come up with something better :)
Briilliant idea!
I think we should untangle the EditPage mess first (remove the use of deprecated doEditContent, then split it into two parts as you explained). It might be better to have a subtask for it.
@Daimona before committing to any work, I wanted to invite you to take a look here. Could this fall within the scope of the grant you got funded for by WMF?
Untangling that mess would be nice, but it doesn't seem to be easy, see T208801: Support slots other than the main slot in EditPage - backend support for a 2018 attempt on doing that.
I think @DannyS712 is also attempting to take this on in this WIP patch.
Doing a big refactor of EditPage first would be probably more sensible than trying to mess around with the current implementation. We'd just have to ensure doing something like described above would be possible.
Oh, now I got another idea. We could also move the AbuseFilter to the storage layer and have PageUpdater return AbuseFilter-induced errors in its Status object. That would work as well, but I have no idea how much refactoring that would require. Probably a lot.
That WIP patch you linked to is intended to be a part of T157658: Factor out a backend from EditPage and should not change the interface for extensions to prevent or modify an edit via the EditFilterMergedContent hook. In short, it moves away from EditPage::internalAttemptSave to a dedicated backend for saving new edits, along with all of the possible reasons edits would be perverted (hooks, edit conflict, page doesn't exist, etc)
I believe this is a great idea.
@Daimona before committing to any work, I wanted to invite you to take a look here. Could this fall within the scope of the grant you got funded for by WMF?
Well, this is a good example of task that is trivial to implement on the AF side, but hard overall due to core limitations. If the behaviour changes in core, then it would be in scope, but it would also be a trivial task that could be addressed outside of the grant. OTOH, changing MW core is probably out of scope (especially for a change this big).
I think it should always be possible to use $this->undidRev in EditPage::runPostMergeFilters. However, that only represents the revID in the URL (undo=xxx). EditPage checks whether the revert is clean (see EditPage::isUndoClean), but that happens *after* running the EditFilterMergedContent hook. I'm not comfortable moving that code around, given that we're talking about a 500 lines long method (EditPage::internalAttemptSave). Also, it would not distinguish manual revert vs undo button, but I don't think that's important.
This is much needed: on fr-wp, for example, we have a couple of banned people harassing volunteers by reverting their edits. And there is no efficient and clean way to handle that currently with AF.
That's because MediaWiki core runs abuse filters (and similar things like SpamBlacklist) before determining whether the edit is a revert. That is, by the time filters are executed, nobody knows whether the current edit is a revert. Until this is changed in MW core, there's nothing we can do on the AF side.
As mentioned above, the relevant core code really is a mess, and even if there've been several improvements recently, I think it's still far from what we'd need to complete this task. I'm not even sure which task(s) should I block this task on, if any.
I understood explanations from @Daimona, but this comment to let know that this feature is still needed (we got vandals who edit undo summaries to bypass AF). Being able to detect built-in tags, such as mw-undo, with AF would be great.
Came to say this would have been handy many times already...
I think there has been some progress since @Daimona's last comment. The hook now provides access to PreparedUpdate, which acts as a handle for the revision being saved and can ensure everything is computed only once per edit (we already use it in AbuseFilter). There really ought to be something like PreparedUpdate::getEditResult.
Unfortunately,
MediaWiki core runs abuse filters (and similar things like SpamBlacklist) before determining whether the edit is a revert.
this is still an issue.