Page MenuHomePhabricator

Shared variables for AbuseFilter
Open, Needs TriagePublic

Description

for example, to store regex with bad words, and such variable could be used in many filters

--AS (talk) 22:20, 15 June 2015 (UTC)

This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey

This proposal received 6 support votes, and was ranked #84 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Moderation_and_admin_tools#Shared_variables_for_AbuseFilter

Details

Related Gerrit Patches:
mediawiki/extensions/AbuseFilter : master[WIP] Add a page for filter config and shared global variables
mediawiki/extensions/AbuseFilter : master[WIP] Add global variables shared between filters

Event Timeline

DannyH created this task.Dec 7 2015, 10:54 PM
DannyH raised the priority of this task from to Needs Triage.
DannyH updated the task description. (Show Details)
DannyH moved this task to Wishlist 51-on on the Community-Wishlist-Survey-2015 board.
DannyH added a subscriber: DannyH.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 7 2015, 10:54 PM
IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on #Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptJan 21 2016, 2:51 PM
DannyH updated the task description. (Show Details)Feb 6 2016, 12:28 AM
DannyH set Security to None.
Huji added a subscriber: Huji.Mar 4 2017, 2:46 PM
Daimona added a subscriber: Daimona.EditedJun 12 2018, 5:57 PM

I took a look a this. There are two main "difficulties" in approaching this task:

  1. Store global variables in a convenient place. However, this isn't such a big problem: a solution like the one in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/420757/ should do the trick (basically we pass info from AbuseFilter class, which provides a higher abstraction level, to AbuseFilterParser, which only handles single filters). Actually, the patch would be very similar to that: we add a class member to AbuseFilter to store global variables, add a function like set_global_var, one like get_global_var, add this special use case to setUserVariable and we're done.
  2. Be able to distinguish between a variable not being set and one not yet computed. For instance, if I define a global variable in filter 100 and try to get it in filter 2, when the parser examines filter 2 the global variable won't be available. Either we solve T132846 (or find a better way to specify execution order), or I think the only solution would be to enqueue the current filter after all the others. Then, when we'll have parsed all filters in ascending order and get back to filter with lower ID which were enqueued, we stop forgiving and throw an exception if the global variable still doesn't exist. This would again require some new logic to allow enqueuing filters, and the real order of execution wouldn't be nor clear, nor customizable. As a side note, this will still produce errors at runtime: in fact, when you create a new filter, the others aren't (obviously) executed, so we have to forgive non-existent global variables. But if such variable isn't actually set (or maybe it was set in a filter that was disabled), it will give a runtime error. This could have been fine if we had a page for problematic executions like the one introduced in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/429464/, but until we don't have it there'll be a way too high risk to prevent filters from being executed.

TL;DR: The patch itself shouldn't be too hard to write, although it involves adding a considerable amount of brand new logic. However, we need to deal with the order of execution of the filters and other related problems that may potentially cause a big havoc.

Change 441630 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add global variables shared between filters

https://gerrit.wikimedia.org/r/441630

Daimona claimed this task.Jun 24 2018, 2:11 PM

I personally worry about the concept of filter priorities (see comments at T198005). I think ideally we'd have a dedicated place to define shared variables that is NOT within a filter.

So I could go to Special:AbuseFilter/variables (something like that), and create new variables similar to how you create filters. That way I have access to runtime variables like user_name (which I think can be interpolated in variables?). This gets computed by AbuseFilter before any actual filters are ran. This means you don't need to worry about the "priorities" or execution order of filters -- the variables are already defined.

A new page would surely be neat. My first thought is that we may be adding even too much pages (if counting T193064 and subtasks), which could potentially end up to confuse users. Maybe something user-friendly like a dedicated page but not as big?

Yeah, a small page for defining global variables I think is the easiest to wrap your head around as a filter author. You'd also have all the shared variables in one place, so you know what's available and can be used in your new filter. This approach also means we could tackle this ticket independently of T186960, or have to worry about having your filter's priorities set correctly.

The other thing I worried about was naming conflicts. On enwiki, a lot of variables have the same name: "stringy", "abuseStr", "nope" (weird choices, but not the point :). So referencing another filter's variable would have to be namespaced ("5.stringy", "15.nope", etc.). That's a bit complicated to me, and there's the possibility you'd still end up with unnecessary duplication, where 5.stringy and 15.nope actually do the same thing -- you just weren't aware because you didn't take the time to check what variables are already available (understandably, because there are a lot of filters to check).

I'll change the patch to use a new page, then. I suppose it's fine to make it visible only for those with abusefilter-modify right, huh? As for naming conflicts, having a global variable named "stringy" shouldn't happen :-) You can't see the content of the variable from another filter, so having such a name and using it as a global is already wrong. As for namespaces, people could just give more explicit name, like "badwords_1" :-)

I'll change the patch to use a new page, then.

Sweet, that's just my opinion of course but I think this will be easier to work with. How will the interface work? If it's easier to implement, I think a plain textarea will be fine:

myGlobalVar1 := "bad word|" + user_name + "|bad word2";
myGlobalVar2 := "foo bar";
...

Is that weird?

I suppose it's fine to make it visible only for those with abusefilter-modify right, huh?

Good point, there could be private things in there, so I would say abusefilter-view-private. That way users with read-only rights can still see how the filters work.

Yes, I think a plain textarea would be the best solution: first, I don't think there'll be many global variables, so the textarea shouldn't become too clogged with stuff. Second, adding brand new forms (like for filters: a button "create new" with a textarea for the variable and a field for its name) requires much more coding. Third, like I was saying above, let's keep things minimal and simple, like a single textarea is.
I can see a single problem: what about versioning? We'd need to implement a custom history, which would require a dedicated DB table and a lot of work to do. Also, to store our global variables we'd need a new table anyway, and it seems a bit overkill to use it only for our variables list. What shall we do?
And yes, probably we should check for view-private OR modify right.

MusikAnimal added a comment.EditedJun 24 2018, 7:19 PM

I can see a single problem: what about versioning?

Grr! That is a problem. Maybe invisibly, on the backend, Special:AbuseFilter/variables is actually filter 0. The only thing different is the checkboxes for the actions, and tags, etc., are all hidden. All you get is a single textarea (and a check syntax button). AbuseFilter knows any variables set in filter 0 are global. You'd also add logic to make AbuseFilter ignore it for logging, since it is used only for variable definition.

Hacky... but maybe it could work?

Well, I also thought of using Special:AbuseFilter/0 :-) Actually, we may call it "Special:AbuseFilter/config" and let it open to further customization in case we'll add new, similar features. It would be a bit hacky, true, and also need a DB change, but should work just well. If no other problems will rise, I'll start working on it tomorrow.

Change 442078 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add a page for filter config and shared global variables

https://gerrit.wikimedia.org/r/442078

@MusikAnimal the patch above is the implementation with a dedicated page. It requires much more changes than the other one, but provides a more stable result, which causes less troubles to users and may be used for other global options in the future.

He7d3r added a subscriber: He7d3r.Jun 28 2018, 7:21 PM