Page MenuHomePhabricator

AbuseFilter should expose matched text to warning messages
Open, Needs TriagePublic

Description

Sometimes a filter tests if a given edit matches any values in a list (or a given regex) and if so, warns the user about it. In those cases it would be useful to be able to show the matched text as part of the warning message. Examples:

This edit introduces the template "$1", which was deprecated after consensus (...)

or

This edit contains the phrase "$1", which may be considered offensive (...)

This would help a lot reviewing filters due to false positives, because the matching text would be easily available directly in the message.

See also: pt:Wikipédia:Café dos programadores#Etiqueta: Inserção de predefinição obsoleta

Event Timeline

If implemented at all, this should be definitely be *optional*. For filters that focus on improper language, outing, etc. we actually want NOT to show the matched phrase, so that the user cannot easily game it by modifying the matched phrase.

We very often design filters on the English Wikipedia to target a singular LTA - this would need to be optional, otherwise we're giving our long term abuse cases a simple way of bypassing a filter

The more I think about it, the less realistic this task is. What if one filter checks for multiple patterns, and more than one is matched? Not every filter is as simple as "match one pattern against the changed text". I am resisting the urge to mark this as Declined, though I cannot see this ever being addressed.

In that case it could show the first pattern which matches (or a list of the first N matches). E.g.: a filter checks for multiple syntax errors such as

added_lines irlike '<(div|s|center|td|small|font|span)/>'

and a user adds

<s>Never mind<s/>. Got it.

We should be able to tell the user that the problem comes from "<s/>".

Think of a complex filter like this:

...
&
(
  added_lines irlike 'pattern1'
  |
  (
    user_name irlike '(pattern2|pattern3)pattern5(pattern6|pattern7|pattern8)'
    &
    added_lines irlike 'pattern10'
  )
)
...

Of course the example is made up, but we do have complex filters that combine pattern matching through Boolean operators. In cases like this, returning a matched string is not good enough; user needs to know what was matched (username, added line, remove lines, ...) too. Again, to think about simple examples is not enough for solving this task.

Proposal: implement add_warning_params and instead of

added_lines irlike '<(div|s|center|td|small|font|span)/>'

do

matches := get_matches('<(div|s|center|td|small|font|span)/>', added_lines);
matches[1] !== false & add_warning_params(matches[1])

add_warning_param will always return true and could be variadic. Any argument added through this function will be added to the warning message as $3, $4, ... It must always be treated as raw HTML.

Proposal: implement add_warning_params and instead of

added_lines irlike '<(div|s|center|td|small|font|span)/>'

do

matches := get_matches('<(div|s|center|td|small|font|span)/>', added_lines);
matches !== false & add_warning_params(matches[1])

add_warning_param will always return true and could be variadic. Any argument added through this function will be added to the warning message as $3, $4, ... It must always be treated as raw HTML.

this would also work for T216001, right?

What should happen when there are several invocations of add_warning_params? Appending arguments? Use the last one?

Appending arguments? Use the last one?

Good question. I believe the former. Alternatively, we could name it set_warning_params and implement the latter.

Daimona subscribed.

Reconsidering after the architecture review, which might make this easier to implement.