Page MenuHomePhabricator

Make SpamBlacklist clearly act prior to AbuseFilter on WMF operational wikis
Open, Needs TriagePublic

Description

Background:
Please take these as my studied observations of operations, though not with any technical background. Also noting that these observations are made from WMF wikis, viewing these through the perspective of an administrator at Metawiki, and having global sysop privileges.

Observations:
We have edits that hit both global abuse filters (m:Special:Abuselog and local equivalents) and also appear in local Special:log/spamblacklist.

I would also note that I don't believe this it is always the case in that I believe that sometimes attempts will only log in log/spamblacklist and not hit abusefilters. (I have not tried to test this anywhere)

I queried this years ago, and was informed that the spam blacklist extension loads before the abusefilters extension, so the blacklisting should occur first and obviate the need for action in abusefilters.)

Issue:
For the best use of abusefilters,, having known and already blocked spam appearing in the abuselogs is simply problematic both in terms of significant abuselog noise, but also from knowing whether an edit has hit or not. As an administrator, especially in SWMT, I would prefer to only have to chase down potential spam, not follow known blocked edits

So I am asking that there is some modification so that the spam blacklist is able to act significantly before the abuse filters so that we are not getting this overlap.

Alternatively, is there a means that we can tune abuse filters so that they can ignore edits from urls with domains in the global spam blacklists, knowing that extension spam blacklist is reliable. So some sort of masking faculty.

cc. @Beetstra @MarcoAurelio @MusikAnimal @Legoktm @Chrissymad

Thanks for your consideration

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 11 2018, 12:12 PM
Aklapper renamed this task from Having spam blacklist clearly acting prior to abuse filters (WMF operational wikis) to Make SpamBlacklist clearly act prior to AbuseFilter on WMF operational wikis.Dec 11 2018, 12:38 PM

IIRC, hooks are executed in the order that they are registered, and AbuseFilter should come first due to alphabetical order; that is why AbuseFilter runs before SpamBlacklist. I'm not sure whether we can change the order (and if we really want it). As for the second proposal, well: for sure we cannot hardcode anything which completely skips AbuseFilter if the edit contains a blacklisted link. We could instead add a variable containing the links on SpamBlacklist so that people may use it on a per-filter basis.

This comment was removed by Billinghurst.

This month is truly a good indicator of how we could make abuse filters useful instead of clagged with superfluous, duplicated hits.

FWIW the spambots are incredibly busy and our base defences seem pissweak, and the reliance on abusefilters and spamblacklist seems troublesome.

@Billinghurst I can try to add the new variable, I think that would be a reasonably quick solution. FWIW I can take a look at the involved filters and try to help directly on-wiki, but unfortunately this isn't a good period :/

Daimona moved this task from Backlog to Next on the User-Daimona board.Dec 22 2018, 5:31 PM

Change 481246 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/SpamBlacklist@master] [WIP] Add an AbuseFilter variable with the content of the spam blacklist

https://gerrit.wikimedia.org/r/481246

Change 481572 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add a new method and hook for static variables

https://gerrit.wikimedia.org/r/481572

Daimona claimed this task.Dec 30 2018, 6:19 PM

Change 481572 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add a new method and hook for static variables

https://gerrit.wikimedia.org/r/481572

Will this be a cheap or an expensive call? Trying to work out where it should be placed in the coding sequence?

Do we also need to book it in for the Tech newsletter?

I am assuming that the variable name is spam-blacklist

Will this be a cheap or an expensive call? Trying to work out where it should be placed in the coding sequence?

It depends, mostly on the blacklist size. Since this variable includes the whole blacklist (i.e. includes the shared one), it'll be pretty large. However, the expensiveness only counts when examinating old entries: during edits, the content of the blacklist is saved in cache and retrieved from there; so, using it once or twice almost makes no difference (if I'm not mistaken).

Do we also need to book it in for the Tech newsletter?

I'd say yes, but the variable still has to be added.

I am assuming that the variable name is spam-blacklist

spam_blacklist, with the underscore.

In the meanwhile, there are two other problems to address. The first one is that, when examinating old edits, this variable will hold the current content of the blacklist, and not the one at the time of the edit. This could be a problem, but there's no easy solution. At least, SpamBlacklist should provide a way to efficiently retrieve an old version of the blacklist, and AFAICS it currently doesn't.
The second one is that currently AbuseFilter lacks a clean way to use this variable: spam_blacklist is an array of strings, and arrays are poorly implemented in AF. A workaround should be something like added_lines irlike str_replace(string(spam_blacklist),'\n','|') but I wouldn't rely on it.

Daimona moved this task from Next to Under review on the User-Daimona board.Dec 31 2018, 2:05 PM

Change 424298 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add array-specific functions

https://gerrit.wikimedia.org/r/424298

Change 424298 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add array-specific functions

https://gerrit.wikimedia.org/r/424298