Add subroutines to AbuseFilter
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Daimona
	Feb 10 2018, 1:40 PM

Description

As discussed in the Community health initiative, it would be useful to add subroutines, i.e. the possibility to check from within a filter if another filter was tripped by the same edit. This would open the doors to a great improvement in performance by avoiding redundancies.
I was also thinking about a partial way to achieve this. If I'm right, AbuseFilter tests filters ordering them by number. Thus, we may limit this functionality to only checking previous filter (=filters with smaller number). This might be further extended by solving T132846 and related task (adding a custom execution order would be fine too) in order to give full control to the users.

Details

	Subject	Repo	Branch	Lines +/-
	[WIP] Add a function to check if another filter matched the action	mediawiki/extensions/AbuseFilter	master	+97 -17

Customize query in gerrit

Related Objects

Mentioned In: T120740: Shared variables for AbuseFilter
T198005: Allow users to specify the execution order for all active filters
Mentioned Here: T198005: Allow users to specify the execution order for all active filters
T132846: Allow moving a filter to another free number

Event Timeline

Daimona created this task.Feb 10 2018, 1:40 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 10 2018, 1:40 PM

Daimona triaged this task as Medium priority.Feb 10 2018, 1:41 PM

Daimona moved this task from Backlog to Filtering features on the AbuseFilter board.

Daimona merged a task: T151353: Add ability to trigger an abuse filter from within another abuse filter.Feb 10 2018, 1:52 PM

Daimona added subscribers: kaldari, Samwalton9-WMF.

Huji subscribed.Feb 10 2018, 2:15 PM

MGChecker subscribed.Feb 10 2018, 7:20 PM

Daimona merged a task: T20067: Decision tree for AbuseFilters.Apr 3 2018, 12:28 PM

Daimona added subscribers: • bzimport, Luke081515, He7d3r and 3 others.

Daimona mentioned this in T198005: Allow users to specify the execution order for all active filters.Jun 23 2018, 10:43 AM

Change 441621 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add a function to check if another filter matched the action

https://gerrit.wikimedia.org/r/441621

gerritbot added a project: Patch-For-Review.Jun 24 2018, 10:23 AM

Daimona claimed this task.Jun 24 2018, 1:23 PM

MusikAnimal subscribed.Jun 24 2018, 3:49 PM

MusikAnimal mentioned this in T120740: Shared variables for AbuseFilter.Jun 24 2018, 4:48 PM

AbuseFilter tests filters ordering them by number. Thus, we may limit this functionality to only checking previous filter (=filters with smaller number).

What if we force the evaluation of a given filter if we detect it hasn't been evaluated? That way AF doesn't need a particular order. Of course, it should check for circular dependencies and max recursion level to avoid loops.

Example:

Filters are evaluated in order
Filter $1 wants to check if filter $2 matched.
Filter $2 hasn't been evaluated yet. Call the function to evaluate filter $2.
Filter $2 gets evaluated, etc. Execution returns to filter $1
Next filter to evaluate is $2. Since it has been evaluated already, there's nothing to do. Evaluate the next filter...

@Ciencia_Al_Poder Yes, I also had that idea. However, I think that we'd better allow users to specify the execution order and avoid setting one without explicitly reporting it. This would give users more control and simplify coding the feature.

Ciencia_Al_Poder's proposal is basically what I was thinking with T198005#4309946. I was suggesting we have input fields at the top of the editing interface to specify what filters are "required", but if AbuseFilter can be that smart to just "cache" the result of filters, such that everything is done at run-time, then I think your proposed workflow is probably better.

So an example filter for the above workflow:

!("confirmed" in user_groups) &
check_filter(2) &
"foo" in added_lines

AbuseFilter will evaluate filter 2 at runtime and cache the result (true or false), so it's ready to go when it encounters any other filter that has check_filter(2). That seems feasible to implement, no?

Next filter to evaluate is $2. Since it has been evaluated already, there's nothing to do. Evaluate the next filter...

It seems unlikely, but there's a possibility $1 won't take action but $2 will. So I probably wouldn't have AbuseFilter skip $2 altogether, but if implementation allows (described above), AbuseFilter will already know what the result of $2 is and hence can jump straight to the actions it's supposed to take.

Yes, it's feasible, and it's also possible to execute every filter only once, since actions to execute are determined basing on the IDs of the matched filters. I'd rather avoid giving full order control to AF itself, while manually adding requirements is what we're talking about in T198005.

In T186960#4310025, @Daimona wrote:

I think that we'd better allow users to specify the execution order and avoid setting one without explicitly reporting it. This would give users more control and simplify coding the feature.

The execution order currently doesn't affect anything AFAIK. Filters could be run in reverse or random order and they should work the same. They're (currently) independent from each other.

If users need to manually choose the execution order because of this, it introduces a path of errors on the end users, when they change execution order and break dependencies. This should be handled transparently.

I'd code it to work recursively: When it finds a check_filter statement (or whatever you call it), it gets the filter execution result, or evaluates it if it hasn't been evaluated yet, and then continue evaluating the current filter.

In T186960#4310027, @MusikAnimal wrote:

It seems unlikely, but there's a possibility $1 won't take action but $2 will. So I probably wouldn't have AbuseFilter skip $2 altogether, but if implementation allows (described above), AbuseFilter will already know what the result of $2 is and hence can jump straight to the actions it's supposed to take.

When $2 executes because it's in use by $1, I was thinking to execute it completely (performing any configured actions), not only evaluate if it matches or not. But yes, deferring the action until it goes through the normal execution order should also be fine. Either way shouldn't affect anything. In fact, ideally, AF should first check only if filters match, and after all filters have been evaluated it should perform the action in batch. I'm not sure if AF is currently coded to do this, or it's performing actions on every filter evaluation when it matches.

The execution order currently doesn't affect anything AFAIK. Filters could be run in reverse or random order and they should work the same. They're (currently) independent from each other.

This is true from a syntax validity POV. However, changing the order could change what we decide to show to the user, see for instance the original report on T132846. Making filters auto-require themselves would throw away the possibility to customize their order and thus solve problems like that one.

When $2 executes because it's in use by $1, I was thinking to execute it completely (performing any configured actions), not only evaluate if it matches or not.

That's actually what we already do: first execute all filters, record the ID of the ones that matched, then query abuse_filter_action with a JOIN on those IDs and progressively execute actions. Nothing would need to be changed from this perspective.

Note about this feature: it'd be cool to make this function accept arrays or be variadic (like contains_any/all). For instance, add two functions:

matched_filter_any([1,2,3,...])
matched_filter_all([1,2,3,...])

Change 441621 abandoned by Daimona Eaytoy:
[WIP] Add a function to check if another filter matched the action

Reason:
Obsolete POC, to be rewritten from scratch

https://gerrit.wikimedia.org/r/441621

Daimona removed a project: Patch-For-Review.Feb 13 2020, 1:41 PM

Random ideas:

we must avoid circular dependencies
we should cache whether a filter was triggered
we can enable filters that don't log hits and only execute on demand from other filters

Daimona edited projects, added AbuseFilter (Overhaul-2020); removed AbuseFilter.Sep 16 2020, 12:17 PM

Daimona moved this task from Backlog to Stretch on the AbuseFilter (Overhaul-2020) board.

Daimona removed Daimona as the assignee of this task.Oct 28 2020, 11:40 AM

DannyS712 subscribed.Nov 14 2020, 9:04 PM

Bugreporter merged a task: T358079: Introduce functions for inter-filter interactions.Feb 21 2024, 8:11 AM

Bugreporter added a subscriber: Dragoniez.

Add subroutines to AbuseFilterOpen, MediumPublicActions

Description

Details

Related Objects

Event Timeline

Add subroutines to AbuseFilter
Open, MediumPublic
Actions