Add function to check if added content matches any regex in a list
Open, MediumPublic
Actions

Assigned To

None

Authored By

	He7d3r
	Nov 15 2017, 2:18 PM

Description

As many inappropriate edits add bad words to articles, many filters¹ are created to check if a user added an expression which was not previously in the page, like this:

bad_words := 'ba+d|real+y\s*bad|not?\s*goo+d|...';
something & added_lines irlike bad_words
          & !( removed_lines irlike bad_words )

However, this approach causes false negatives such as this: a user can remove "baaad" while he adds "really bad", and the edit will not be matched. This is more frequent if the regex for bad_words contains many alternatives, because then the user can remove one of them and add any/all of the others while going undetected.

An approach to fix that would be to write the filter as

bad_word1 := 'ba+d';
bad_word2 := 'real+y\s*bad';
bad_word3 := 'not?\s*goo+d';
bad_word... := '...';
something & (
     added_lines   irlike bad_word1
& !( removed_lines irlike bad_word1 )
|
     added_lines   irlike bad_word2
& !( removed_lines irlike bad_word2 )
|
     added_lines   irlike bad_word3
& !( removed_lines irlike bad_word3 )
| ...
)

but this is unnecessarily repetitive, makes the filter very long (and maybe increase the condition count more than it should?). It should be possible to just check if a user is adding some bad thing without having all this trouble... Maybe a new function could be added, which would allow something like this:

bad_word_regexes := [ 'ba+d', 'real+y\s*bad', 'not?\s*goo+d', '...' ];
something & irlike_added_any( added_lines, removed_lines, bad_word_regexes )

(the name and syntax is just an example, feel free to suggest something better)

¹Examples

Related Objects

Mentioned Here: T179957: Add function to store regex match(es)

Event Timeline

He7d3r created this task.Nov 15 2017, 2:18 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 15 2017, 2:18 PM

I know the problem. A possible solution, available within days, would be to use the newly implemented get_matches function (T179957) in a way like:

matched:=get_matches("any|bad|word|you want", added_lines)[0];
matched != false &
!(removed_lines irlike matched)

This way you know that the check will be performed on the same word, e.g. if the user adds "really bad" it will only check for that in removed_lines. I don't know if this fully solves the problem (that would otherwise need a brand new function), but it can surely fix some of those situations.

Add function to check if added content matches any regex in a listOpen, MediumPublicActions

Description

Related Objects

Event Timeline

Add function to check if added content matches any regex in a list
Open, MediumPublic
Actions