Page MenuHomePhabricator

Allow users to specify the execution order for all active filters
Open, Needs TriagePublic

Description

Right now filters are executed in ascending order, using their ID. At the moment this doesn't really cause much problems, apart from the ones exposed in T132846 and other minor stuff. However, we'd better be able to manage execution order so that we can start working on bugs like T120740 and T186960. I'll try to work on this, but I think the real problem is to find a nice interface to show, while coding it shouldn't be too hard (although it'll require a new DB column).

Details

Related Gerrit Patches:
mediawiki/extensions/AbuseFilter : master[WIP] Add execution priority

Event Timeline

Daimona created this task.Jun 23 2018, 10:43 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 23 2018, 10:43 AM

Change 441613 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add execution priority

https://gerrit.wikimedia.org/r/441613

Daimona claimed this task.Jun 23 2018, 11:23 PM

This is exciting stuff! But I'm a little confused how it will work. In the UI, will there be a new input field when creating/editing a filter to specify its priority? From the patch it looks like this is meant to be a unique value. How will the filter author know which priorities are already taken? Worse, how will they manage to change the priorities? So say the priorities are set to filters A -> B -> C. I'm working on filter D, and I realize its priority should be after A. Do I need to manually go through and change the priority of B and C?

I know it's probably drastically increasing implementation complexity, but to make things easier to understand for the user, I envision you being able to "require" other filters. So (following example above) when editing filter D, I can say it requires filter A. That would include the filter as a whole and any variables it defines. AbuseFilter checks all filters to see the requirements, and figures out the execution order for you. It will just know to run A before D. So I guess in the database we'd need a through table (or something) to specify which filters require which other filters. AbuseFilter can sort by required filter ID and run those in that order. Would something like that work?

I would also expose a message in the interface, "This filter is required by filters 4, 5, and 6". Like high-risk templates, these dependency filters will become especially fragile. If it malfunctions, all the ones that require it may also malfunction. Scary stuff...!

Daimona added a comment.EditedJun 24 2018, 4:23 PM

My implementation is just an idea, we may (and should?) discuss how to make it better :-) Right now, this is how it works:

  1. On every filter edit form, there'll be a numeric input field (note: right now it's actually a text field 'cause I couldn't find a function to make it numeric; OOUI will probably help) right under "description" to specify the priority, which for new filters (and also with database update) will default to the filter ID
  2. For deleted or disabled filters, such value will always be 0 (users may try to change it, but the change won't be saved)
  3. Priorities are shown on Special:AbuseFilter for each filter, next to the ID, and they're sortable, so users can easily know which ones are already in use. It might not be straightforward, but from my POV it's surely acceptable.
  4. As for changing priorities like in the example above, the answer is "it depends". If A and B are consecutive yes, the user would need to manually shift existing filters (either increase B or decrease A). Otherwise they'd just need to set an unused priority between A and B. Given that there usually are several disabled filters around, it shouldn't bee too hard to find a gap. I couldn't think of anything better to keep priorities unique.

Anyway, whatever version will be the final one, I already planned to thoroughly document it on MWwiki.

ADD (I just noticed the new message): I personally don't like deciding an order serverside like this. The reason is that users wouldn't have much control, and as you mentioned it would be really easy to break something. I think it should work, but I'm not sure whether we should make it work :-)

MusikAnimal added a comment.EditedJun 24 2018, 5:01 PM

It might not be straightforward, but from my POV it's surely acceptable.

I wouldn't assume :( The priority concept is pretty scary, as I said, because small changes to one filter could break so many others. This could be really complicated on somewhere like enwiki where there are almost 200 enabled filters -- even for a seasoned filter manager. At the very least, if I do need to shift the priorities, I shouldn't have to do so manually on each filter. The time spent doing this could mean lots of filters break and false positives occur, right? :-/

You can think of the "require filter X" concept like use or require in PHP. It's defined via a separate input (or set of inputs) in the filter interface, at the top, so it's clear what filters are dependent. With a dedicated database table storing what filters depend on what, it's easy to put at the top of filter 3, "This filter is used by filters 4, 5 and 6". I don't need to go digging to see what I might break when I change filter 3.

Setting a numerical priority means we have two numerical systems... "Filter 50 is actually filter 31" (because of the priority set on it). That's pretty tough to understand and keep track of. I think the "priority" stuff should be invisible to the user. All they should be concerned about is defining what filters they want to reference in the filter they're editing.

The priority concept is pretty scary, as I said, because small changes to one filter could break so many others. This could be really complicated on somewhere like enwiki where there are almost 200 enabled filters -- even for a seasoned filter manager. At the very least, if I do need to shift the priorities, I shouldn't have to do so manually on each filter. The time spent doing this could mean lots of filters break and false positives occur, right? :-/

Yes and no: it's true that some harm could be done, but only with a combination of unlucky factors like many consecutive numbers, something that really needs to have its priority changed, and which is needed/needs other filters.

You can think of the "require filter X" concept like use or require in PHP. It's defined via a separate input (or set of inputs) in the filter interface, at the top, so it's clear what filters are dependent. With a dedicated database table storing what filters depend on what, it's easy to put at the top of filter 3, "This filter is used by filters 4, 5 and 6". I don't need to go digging to see what I might break when I change filter 3.

The concept itself is pretty clear, my concern was about not letting users know what's happening, but maybe it's just a hunch. I think a column like af_priority should be enough to store requirements, and we should code really carefully the logic behind (e.g. to avoid circular references).

Setting a numerical priority means we have two numerical systems... "Filter 50 is actually filter 31" (because of the priority set on it). That's pretty tough to understand and keep track of. I think the "priority" stuff should be invisible to the user. All they should be concerned about is defining what filters they want to reference in the filter they're editing.

I think it depends on the specific personality of every editor :-) In fact, I'd like to hear some other thoughts from other people about how to implement this feature, then I shouldn't have any problem whatever we choose to do.

I think sharing variables or add subroutines in Abuse Filter does not require to have a specific order of execution, or at least the implementation of those features shouldn't require this.

Priorities may make sense if we choose to "stop execution of any other filter if the action matches this filter". But this feature doesn't exist right now. Every filter is evaluated independently from others and the order shouldn't matter.

Yeah, from a usability standpoint maybe we should solicit input from the community. That didn't go well last time I tried (only got one reply), but worth trying again!

Priorities may make sense if we choose to "stop execution of any other filter if the action matches this filter".

And you probably wouldn't ever want to "stop execution", unless the action (edit, upload, etc.) was disallowed or the user was blocked. Not sure if AbuseFilter already does this automatically, but it seems it (maybe) should, as it otherwise is consuming conditions for an action that has already been disallowed.

Definitely, let's wait for other thoughts, hoping that someone will come :-).

And you probably wouldn't ever want to "stop execution", unless the action (edit, upload, etc.) was disallowed or the user was blocked.

In fact, we don't want to stop execution, never ever. Not even if the action was disallowed or the user blocked, for at least two good reasons: first, the duration of the block is determined after running all filters, and is the longer one; so we may stop when reaching an infinite block. Second, if the execution is halted, people will never know if an edit was matched by a given filter, and this may cause confusion ("Oh wait, why wasn't this edit caught by this filter? Lemme waste half an hour trying to figure it out... [30 min later] ... Oh, the execution was stopped before checking this filter!"), and would also produce incomplete records in AbuseLog.

Just a little note which I reported in T186960, and which I'll copy here since it's specifical to this task: if we let filters auto-require themselves (i.e. set an automatic, uncustomizable order) would throw away the possibility to solve problems like the one exposed in the original task T132846. Also, for a problem like that you're not really interested in "requiring" filters, but precisely in running them in a specific order.

... which I'll copy here since it's specifical to this task

Yes sorry for the fragmented discussion. Let's talk about ordering here on this task :)

changing the order could change what we decide to show to the user, see for instance the original report on T132846. Making filters auto-require themselves would throw away the possibility to customize their order and thus solve problems like that one.

This is a great point that I didn't think about. My main concern is changing priorities, and trying to figure out where your filter should be.

I think we need a UI to clearly show the order. I know it's a crap ton more work, but it'd be amazing to have a Special:AbuseFilter/order page, with a full list of the filters with descriptions, and you can drag and drop to specify the order as you please. Having to manually put in the numerical priority on each filter, and worse, trying to change all the other priorities, isn't a great user experience, methinks.

At any rate, I think being able to specify the order is a separate issue than T186960 and T120740. I shouldn't have to think about ordering when all I want is the result of filter N, or a global variable.

Yes sorry for the fragmented discussion. Let's talk about ordering here on this task :)

No problem :-) From my POV these tasks are strictly related.

I have also thought about the new page, but my concerns are the same as for shared variable: we need another dedicated page (more confusing), although at least we could still use the new column added in my patch. I don't know if the drag and drop would be feasible, but an alternative might be to: display a list of filters "ID - name - priority" in a column, where the "priority" field is editable for all filters and the table is sortable. Users may change priority and when they're done they would click a "save" button, to trigger validation and save everything at once using af_priority column. This howevers raises another problem: again, what about versioning? :-) We'd still need a new, dedicated history page(r), unless we want to display priority change in filter history, although it's made from a separate page. I agree that this isn't really comfy, in fact we need a compromise, which should be both easy to use and not a pain in the neck to code (nor it should make AF too heavy).

However, I still think that these three tasks are strictly related, mainly because for them we have 2 options:

  1. Only allow to retrieve info from previously computed filters
  2. Allow to retrieve info from any filter in a way like T186960#4310018

Where (2.) completely prevents order customization and thus isn't really good. As for (1.), either we give users a way to specify execution order (be it priority, drag and drop, you name it) or it would be nearly unusable unless people appositely create new filters to re-arrange order. So either way we care about execution order, we just need to decide what to do with it.

  1. Allow to retrieve info from any filter in a way like T186960#4310018

Where (2.) completely prevents order customization and thus isn't really good.

Sorry, but I fail to see how (2.) prevents order customization. (2.) doesn't need nor require order customization. If I understood correctly, the order is only important when evaluating the actions to take with the rules that match. And you said all filters are evaluated first, and then it uses the order to determine which action to take when the same action overlap (for example, block duration)

Right now, the order of execution should only make the difference for actions to execute, right. This means that, right now, we could choose (2.) and set an automatic execution order, and in the meanwhile make the order in which actions are executed customizable. However, this way in the future we won't be able to make users customize "checking order", and this may make some new features un-codable, which of course isn't desirable.

Vvjjkkii renamed this task from Allow users to specify the execution order for all active filters to 6eaaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii removed Daimona as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
JJMC89 renamed this task from 6eaaaaaaaa to Allow users to specify the execution order for all active filters.Jul 1 2018, 4:12 AM
JJMC89 assigned this task to Daimona.
JJMC89 raised the priority of this task from High to Needs Triage.
JJMC89 updated the task description. (Show Details)