Page MenuHomePhabricator

Remove AbuseFilter's DSL and use JavaScript instead
Closed, DeclinedPublic

Description

Problem
AbuseFilter has its own domain-specific language. This language must be learned by admins and maintained by developers.

Proposed Solution
If admins are more familiar with JavaScript, perhaps it would be better to allow them to write JavaScript. These scripts could be executed in a secure vm and function similar to a Service Worker:

addEventListener('filter', (event) => (
    event.action === 'edit' && event.old_wikitext.length > 0 && event.new_wikitext.length === 0 && !event.user_groups.includes('user')
));

or we could even make the script itself the handler and the event object as the global, which would make it like this:

action === 'edit' && old_wikitext.length > 0 && new_wikitext.length === 0 && !user_groups.includes('user')

Or ideally, both syntax options would be allowed with a config when creating the filter.

This would give admins the ability to write filters in standard JavaScript while also maintaining a secure environment for our infrastructure. Doing this should open up the number admins who can write filters, provide additional linting and testing tools, and also remove the amount of code that AbuseFilter has to maintain.

Event Timeline

dbarratt updated the task description. (Show Details)

Hmm, could you provide examples of what a typical filter would look like? In my opinion the AbuseFilter language is fairly straightforward, especially if you already know another high-level language. Many of the AbuseFilter managers I know are not very adept programmers, but they were able to pick up AbuseFilter syntax. Even having to use curly braces is going to make the code overly verbose. Promises and the like sound like a nightmare; even many JavaScript programmers struggle with this concept. I as a new user with limited technical experience should be able to at least read what a filter is doing. user_editcount < 20 & page_namespace = 1 for example is succinct and easy to understand. I don't need to look at the AbuseFilter documentation.

write filters on a desktop IDE, and deploy them with a CLI. This could enable developers to use npm dependencies and compile their script into a single file for deployment.

By "developers" you mean AbuseFilter managers? I don't envision a scenario where a filter would/should have npm dependencies (even dev-only). Other AbuseFilter managers should be able to contribute to my filter without having to leave the wiki, much less have to set up a development environment or use the terminal.

Overall I think JavaScript, unless you limit the available syntax, would be much too complex and pose a bigger barrier for newcomers. It would be cool to be able to refactor logic in my filter using functions, etc., but filters are typically very simple and wouldn't really benefit from this. Moreover, changing the language would be a massive undertaking that I question would be really feasible on a social level. I suspect the changes to the backend would also be massive, perhaps requiring a complete rewrite. @Daimona is probably the best person to consult in that regard.

Hmm, could you provide examples of what a typical filter would look like?

A simple example could be event handlers that return a boolean (like they do today):

addEventListener('filter', (event) => (
    event.action === 'edit' && event.old_wikitext.length > 0 && event.new_wikitext.length === 0 && !event.user_groups.includes('user')
));

You wouldn't have promises or anything because you wouldn't have access to the network or the database.

If we wanted to make it even simpler we could provide an option to already have the event handler and have the event object as the global, so then it would simply be:

action === 'edit' && old_wikitext.length > 0 && new_wikitext.length === 0 && !user_groups.includes('user')

A simple example could be event handlers that return a boolean (like they do today)

When would we want a filter to return anything other than a boolean?

You wouldn't have promises or anything because you wouldn't have access to the network.

But you could still create promises that contain logic, just to get them to run asynchronously. I don't know why you would want to do this, but if I encountered such a filter as a newcomer would be baffled on how to make safe changes to it.

If we wanted to make it even simpler we could provide an option to already have the event handler and have the event object as the global, so then it would simply be:

action === 'edit' && old_wikitext.length > 0 && new_wikitext.length === 0 && !user_groups.includes('user')

This would be the only option, for sure. How would we lazy load the values of the variables? Correct me if I'm wrong, but currently I believe you can think of the variables as functions, where the database queries are only ran when needed. So for instance, with user_editcount < 20 & page_recent_contributors contains "MusikAnimal", the query for page_recent_contributors isn't ran if the first condition returns false. This is crucial for performance. To do this in JavaScript I think we'd need to use promises, with .then()'s and such, no? Unless we forced everything to be synchronous. The sheer possibility of being able to fire off multiple database queries at once, when you don't need to, is a bad idea I think.

A simple example could be event handlers that return a boolean (like they do today)

When would we want a filter to return anything other than a boolean?

Maybe for T216001

When would we want a filter to return anything other than a boolean?

idk... maybe there is a use case for not returning a yes or a no, but a maybe? or perhaps a use-case for returning a "reason" that is dynamic?

But you could still create promises that contain logic, just to get them to run asynchronously.

JavaScript is single-threaded, even if you create a promise, it's still going to execute synchronously.

This would be the only option, for sure. How would we lazy load the values of the variables? Correct me if I'm wrong, but currently I believe you can think of the variables as functions, where the database queries are only ran when needed. So for instance, with user_editcount < 20 & page_recent_contributors contains "MusikAnimal", the query for page_recent_contributors isn't ran if the first condition returns false. This is crucial for performance. To do this in JavaScript I think we'd need to use promises, with .then()'s and such, no? Unless we forced everything to be synchronous. The sheer possibility of being able to fire off multiple database queries at once, when you don't need to, is a bad idea I think.

So for something lazy-loaded variables like user_editcount or page_recent_contributors could become a standard function that we'd pass into the event (global) object. When you call it, you would call it with await before it:

await user_editcount() < 20 && (await page_recent_contributors()).includes("MusikAnimal")

which tells JavaScript that you want the value, not the promise.

While in this syntax, a single script could not execute more than one database query at a time (Unless we allowed you to use the Promise object, which we could prevent).

However, a nice side effect is that while this script is waiting for the lazy-loaded value, another script can start, which would actually have a nice performance benefit.

I have to say that I'm not a big fan of this change, mostly for what I said in T49512#4717791. I believe this would be a gigantic change for us to implement, and for people to receive.

I think that the added complexity would find very, very few use cases, and I don't think we need filters not returning a boolean, not even for T216001.

Learning the filter syntax includes learning both the variable names and the functions. Even if we suppose that they're equally difficult, learning the variable names would still be a problem with JS or whatever language.

And we should also make sure not to offer too powerful tools, that may end taking up too many resources.

In short, for how big this change is, I don't think we'd have enough return from it.

dbarratt changed the task status from Open to Stalled.Jul 17 2019, 10:03 PM

@dbarratt: As you set the task status to stalled, who exactly / specifically are you waiting for for further input?

@dbarratt: As you set the task status to stalled, who exactly / specifically are you waiting for for further input?

The maintainers of AbuseFilter are free to decline it and close it, but until then I wanted to indicate that it wasn't actively being worked on and is looking for further input. Is there a better way to do that?

Aklapper changed the task status from Stalled to Open.Jul 18 2019, 12:03 AM

"actively being worked on" by someone is expressed by the "Assignee" field. You might mix up that a task itself is stalled (I do not see that here) vs that implementing what's proposed/discussed is stalled. Maybe Proposal covers that?

"actively being worked on" by someone is expressed by the "Assignee" field.

Thanks for the clarification. :)

Legoktm subscribed.

We're not planning to do this for the reasons expressed in T49512#4717791. In any case, if we were going to implement a real programming language for AbuseFilter, it would be Lua, for which we already have secure sandboxing implemented and deployed.