Page MenuHomePhabricator

Add an auto-formatter to AbuseFilter
Open, LowestPublic

Description

This is very low-prio. AF could have an auto-formatter, activated by a button next to "check syntax", that would apply style changes to the code of the filter. E.g. spaces around operators and parentheses, linebreaks where applicable, etc.

Event Timeline

Daimona triaged this task as Lowest priority.Jan 1 2020, 1:43 PM
Daimona added a project: User-Daimona.

A quick note: this is best done by pretty-printing the code from the AST. Avoid a bunch of ad-hoc regex replaces if possible.

A quick note: this is best done by pretty-printing the code from the AST. Avoid a bunch of ad-hoc regex replaces if possible.

Definitely :) That's the implementation I was thinking of. I think it wouldn't be super-hard to add an API module which would get the tokenized filter, and format it with spaces.

Before this can be done (and I have some ideas about doing it) we should establish the preferred formatting to be used; I would suggest discussions on enwiki/meta/mediawiki or somewhere else where users who edit abuse filters can chime in - we shouldn't apply formatting that most users would disagree with / formatting applied should follow general practices among users.

Before this can be done (and I have some ideas about doing it) we should establish the preferred formatting to be used; I would suggest discussions on enwiki/meta/mediawiki or somewhere else where users who edit abuse filters can chime in - we shouldn't apply formatting that most users would disagree with / formatting applied should follow general practices among users.

Or we can also implement a "standard" formatting (see below), and change it later if someone has better ideas. This formatting should only happen if and when the user hits a button and then saves the filter, so I think it can't hurt.

As for the formatting, I'd propose this sketch:

  • spaces surrounding identifiers, keywords, literals, operators (not unarys) and parentheses.
  • No space around square brackets
  • A single space after commas
  • A single space before unarys operators
  • Break long lines at x (= 80?) characters
    • Try to keep parenthesized expressions on their own line, if they fit the char limit
    • Break points should be: operators (not unary), parentheses, and semicolons
    • If a plain list of operators (arithmetic or logic) must be broken, then each element of the list should go on its own line
      • Per item above, a parenthesized expression inside a list counts as a single element
    • After a semicolon, insert a double break
    • If a function call must be broken:
      • If the function is variadic, put as many arguments as possible on each line
      • If it's not variadic, put an argument on each line (TODO: does this make sense?)
    • Operators should go at the end of the line
    • What about open parentheses after an operator? Should they go at the end of the line, after the operator, or alone on the next line? This is in case the parenthesized expression cannot fit on a single line.
  • if/then[/else]: either the whole thing fits into a single line, or we break it after then, the "iftrue" statement, the else, the else statement, and the final end
  • Proper indentation TBD. Maybe one extra level for every pair of parentheses, no extra levels for consecutive operations. Use tabs?
  • "Big" problem: comments. They're discarded when tokenizing the code, hence they're not represented in the AST. We cannot write a formatter if we don't fix this.
  • It'd be cool if the formatter could also remove unneeded parentheses, however this conflicts with the item below.
  • Important, and it should be obvious, but I'm writing this down anyway: the "raw" code and the "formatted" code should be mapped to the same AST. Which means that we won't have to cache the same code twice. There should be lots of tests for autoformatting, and some of those should ensure that the AST is invariant under the effect of the pretty-print.

As I said, this is just a sketch. More things can be decided, or changed, while implementing it. If some of my examples above are unclear, please ask and I can write down some examples.

I also think that the AST should be used, not the list of tokens, because some of the things above cannot be done with a plain ordered list.

I don't think this can reasonably be implemented within the grant, especially given the priority, but perhaps we can make a start and determine what this formatter is going to look like, what style to use, where to ask for feedback, etc.

I should also note that, while this is not a fundamental feature, filters should always be as much readable as possible; many filters are not such, mainly because they used to be written in a plain textarea until 2018 when we implemented the Ace-based editor. Hence, having a formatter is more than just an optional.

Too low priority to fit within the grant scope.