- Mentioned In
- T279477: Move spam blacklist rules to a database table
T279476: Move title blacklist rules to a database table
T254649: Rename SpamBlacklist
T254650: Rename TitleBlacklist
- Mentioned Here
- T279476: Move title blacklist rules to a database table
T279477: Move spam blacklist rules to a database table
T241440: Allow private blocking of harassment via regexes and URLs on-wiki
T6459: Create a special page to handle additions, removals, changes and logging of spam blacklist entries
T14963: allow per-page exceptions to spam blacklist
T27524: blacklist may become too big
T38940: Title blacklist should return friendly error message and allow user to report false positives
T75417: Whether or not the user recently hit the spam blacklist (abuse filter)
T209806: TitleBlacklist should have its own content type
T216803: Add an option for an expiration date for spam blacklist items
T45761: Allow local disabling of global AbuseFilters
T254649: Rename SpamBlacklist
T254650: Rename TitleBlacklist
The blacklists/whitelists as lists are very simple, abusefilters are not so.
AF allows much more control, that's true, but to my mind providing a simple single-page editing paradigm as currently provided is sufficiently similar that we can lift+shift for now, and then possibly re-factor later over the next few years (e.g. logging of hit rates, thresholds for activity, more complex response options than pass/fail, CheckUser integration, etc.) if there's demand.
I suppose that I am seeing differences with globality. The blacklists are universal for WMF, though implementation of AFs while being universal, the checks are not global in impact (we have global AF that do not target large wikis). So we have some "language" issues to address.
Noting that there would need to be significant tuning work on the logging as something like Special:Abuselog for global AF is bad enough as it is without including blacklist hits which are currently only locally logged.
Also there is still the issue that "title blacklist" is not logged locally and globally, and that has upsides and downsides.
(Of course, all that is detail and probably does not belong here at the top level, just what my brain is contemplating on the immediate.)
In my opinion AF is intended to be deployed to all wikis even including larger one like enwiki, but wikis may choose to opt out some specific filters or opt out all by default and opt in specific one (both are currently not possible, see T45761: Allow local disabling of global AbuseFilters), based on local consensus - otherwise the list of wikis to opt-out is very random as large wikis are not always more active ones (they are only large in database size).
TitleBlacklist and SpamBlacklist currently use a wiki page to store their contents. Eventually they should be switched to databases (performance should be considered);
Issues that may be solved easier - T38940, T6459, T14963, T27524(*), T75417, T216803
Issues that may be closed - T209806
(*) Most SBL items may be converted to a linksearch-like syntax (org.wikipedia.en/...)
I'm not sure what this task is proposing. Technically the functionality from spam/title blacklist already exists in AbuseFilter, it would be trivial to write a filter which blocks certain links from being added or pages from being created with certain titles.
That aside, I'm not convinced this is the best approach. We have tens of thousands of blacklisted URLs across Wikimedia projects. That's not feasible to include in one filter, nor is it feasible to create individual filters for each URL. The functionality we need to block spam URLs is relatively limited (though there's certainly room to expand on the current all-or-nothing approach), whereas AbuseFilter is a deeply customisable toolset with far too much going on for the relatively simple task of blocking certain URLs.
I don't think merging the extensions is a good idea. AbuseFilter is already incredibly complex (for good reason). SpamBlacklist and TitleBlacklist are both straightforward to use (a list of regexes), work out of the box, bundled extensions. AbuseFilter still isn't eligible for bundling yet.
That said, if the goal of this ticket is to re-create Phalanx I'm all for it.
SpamBlacklist and TitleBlacklist are both limited in what they can do. They have terrible interfaces. They don't get integrated with new action types. The complexity of AF is optional and not required.
AbuseFilter still isn't eligible for bundling yet.
We'll be bundled by the time 1.37 ships.
I mostly second this comment, but I feel the need to expand on a point in particular. Technically speaking, AbuseFilter should already be capable of everything that Spam/TitleBlacklist can do. The difference is that AbuseFilter is much more complex, in that it allows splitting rules into filters, adding lots of conditions and different consequences, all with a visual interface (not with things like <noedit | autoconfirmed |errmsg=titleblacklist-custom-msg>). TB/SB are probably meant to be a lightweight alternative that doesn't require to learn a scripting language and code fine-grained checks. I think merging everything might be fine, but ideally we'd want to do a bit more than just combine the code.
It's also unclear how the SP/TB code would integrate with AF. E.g. how would they interact with the DB schema? Just migrating the special pages as they are doesn't seem useful. The other possibility I can think of is having a "special" filter for the TB (same for the SB). But then I don't think we'd have to import any code, as this can already be done on-wiki. Another thing to keep in mind is that TB/SB have everything in a single page, and each regex can specify what consequences should be taken. This cannot be preserved in AF, i.e. every filter has a fixed set of consequences.
Long story short, I think having a single, centralized tool might be a good idea, but I currently can't think of a way that makes sense.
Per my commit message, I was thinking of phases:
- Move the current functionality into the repo as-is (this task)
- Change the editing experience into a visual editing experience that's simpler than learning regex or scripting language (T6459)
- Change the storage from a simple page into a DB table (T279476 and T279477)
At that point, we'd have the ability to fuse the different sources of Filters into different types of filter with different abilities, whilst being consistent about e.g. Unicode normalisation, or triggering actions, or so on.
This is sometimes a feature, but yes. I think Wikimedia still doesn't have fully global AbuseFilters but SpamBlacklist and TitleBlacklist are fully global.
They have terrible interfaces. They don't get integrated with new action types. The complexity of AF is optional and not required.
Agreed on this. I just don't see how wholesale moving the code into the AbuseFilter repo is a good idea on how to fix these problems. I think it would be better to do an analysis of the features of each extension, figure out how they integrate with AF, and then add that functionality...not just copy code around.
+1 to everything Daimona said.
I never used Phalanx but the big limitation of SB/TB is you can't add additional conditions based on the regex nor can you pick other consequences besides disallow. So it would be nice if a title was warning only for < 50 edits users. Or something. And the AbuseLog is way more rich than the very limited SpamBlacklist log we have (and that's a recentish thing too). But managing giant regexes in AbuseFilter is a pain, so SB gets plenty of use that way. I say this as a person who was deeply involved in AF/SB/TB around 2013-2016 but hasn't done much since, so it's possible I'm out of date!
What is the functionality unavailable in AbuseFilter that would make it on par with *Blacklist functionality? Off the top of my head these come to mind:
- a function to take a (possibly foreign) wiki page with a list of regexes and match the other input against it (optionally case-insensitively)
- tagging of the regex list entires with user rights and such - that can be replaced with a separate regexlist page for every flag, for some loss of usability
- intelligent error reporting (I want to know which regex matched)
- performance (filters are disabled if they match too often, which is not ideal for an anti-spam feature)
The first seems easy to do, the others not so much. I agree that moving the current code/funcionality into AbuseFilter as-is doesn't seem useful.
"Move everything as-is" means you don't have to do an endless consultation with 1000 wikis' worth of sysops as nothing changes for them; giving them the ability to slowly migrate to proper Filters after as-is, without breaking existing workflows, is exceptionally valuable.
Moving everything as-is messes up git histories, it makes the organization of the code less logical, and it is functionally equivalent to doing nothing, so doing nothing seems preferable to me.