Move all the functionality of {Spam,Title}Blacklist extensions into AbuseFilter and retire them
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Jdforrester-WMF
	Apr 5 2021, 12:28 AM

Description

May supersede T254650: Rename TitleBlacklist and T254649: Rename SpamBlacklist.

Details

Subject	Repo	Branch	Lines +/-
[WIP] Import the SpamBlacklist and TitleBlacklist extensions as "SimpleList"	mediawiki/extensions/AbuseFilter	master	+9 K -110
Fix error reporting in BlockedDomainStorage for real	mediawiki/extensions/AbuseFilter	master	+17 -15
Fix broken error reporting in BlockedExternalDomains	mediawiki/extensions/AbuseFilter	master	+3 -2
BlockedExternalDomains: Make this a special right, prohibit direct editing	mediawiki/extensions/AbuseFilter	master	+72 -9
Introduce Special:BlockedExternalDomains	mediawiki/extensions/AbuseFilter	master	+789 -5
Zuul: [mediawiki/extensions/AbuseFilter] Add Scribunto & EventLogging deps	integration/config	master	+4 -2

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		None	T254646 Reconsidering how we name things
Open		None	T281536 Schema:EditAttemptStep uses non-inclusive language.
Open		None	T279275 Move all the functionality of {Spam,Title}Blacklist extensions into AbuseFilter and retire them
Resolved		Daimona	T191740 Bundle AbuseFilter extension with MediaWiki
Resolved		Daimona	T192325 Setup phan for AbuseFilter
			Restricted Task
Resolved		Daimona	T223654 AbuseFilterCheckMatch API reveals suppressed edits and usernames (CVE-2021-31547)
Resolved		Daimona	T213006 Create a script to update afl_var_dump, drop back-compat code
Resolved		Urbanecm	T246539 Dry-run, then actually run updateVarDumps
Declined		Daimona	T246938 How to update/delete ExternalStore entries?
Resolved		Daimona	T252696 Find a good way to run the updateVarDumps script on large wikis
Resolved		Daimona	T152394 AbuseFilter privacy concerns on action == 'createaccount' and 'accountname' (CVE-2021-31552)
Resolved		Daimona	T71367 page_recent_contributors leaks revdeleted user names (CVE-2021-31545)
Resolved		Daimona	T199544 Make AbuseFilter work on PostgreSQL and SQLite (epic)
Resolved		matej_suchanek	T62639 Database: SQLite and PG are missing columns
Resolved		Daimona	T42757 Joins on INTEGER and TEXT fail with PostgreSQL
Resolved		Daimona	T193068 Add support for SQLite and postgre when searching patterns
Resolved		matej_suchanek	T199506 Investigate possible issues with PostgreSQL
Resolved		matej_suchanek	T199507 Investigate possible issues with SQLite
Resolved	PRODUCTION ERROR	Daimona	T221357 Read timeout reached while viewing AbuseLog
Resolved		Daimona	T251967 quibble-vendor-sqlite-php72-docker is broken by AbuseFilter
Resolved		Umherirrender	T259377 Migrate AbuseFilter to Abstract Schema
Resolved		Daimona	T220791 afl_filter should be split in afl_filter_id and afl_global
Resolved		Marostegui	T234052 Add abuse_filter_log.afl_filter_id and afl_global columns
Resolved		• Bstorm	T234615 Wikireplicas changes for abuse_filter_log including two new columns
Resolved		Daimona	T269712 Migrate afl_filter to afl_filter_id and afl_global
Resolved		Urbanecm	T269713 Run the MigrateAflFilter script for AbuseFilter
Resolved		Marostegui	T291719 Remove abuse_filter_log.afl_filter column and adjust schema consequently from Wikimedia production
Resolved		rook	T291806 Remove afl_filter column from the views
Open	Feature	None	T47747 Create a tool like SpamBlacklist in AbuseFilter
Open		Ladsgroup	T337431 Rework MediaWiki:SpamBlacklist
Open		None	T279476 Move title blacklist rules to a database table
Open		None	T279477 Move spam blacklist rules to a database table

Event Timeline

Jdforrester-WMF created this task.Apr 5 2021, 12:28 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 5 2021, 12:28 AM

Jdforrester-WMF added a subtask: T191740: Bundle AbuseFilter extension with MediaWiki.Apr 5 2021, 12:28 AM

Jdforrester-WMF mentioned this in T254650: Rename TitleBlacklist.

Jdforrester-WMF mentioned this in T254649: Rename SpamBlacklist.

Reedy updated the task description. (Show Details)Apr 5 2021, 12:31 AM

Would like to understand to what "functionality" refers. The blacklists/whitelists as lists are very simple, abusefilters are not so.

In T279275#6971523, @Billinghurst wrote:

Would like to understand to what "functionality" refers.

Their existence.

The blacklists/whitelists as lists are very simple, abusefilters are not so.

AF allows much more control, that's true, but to my mind providing a simple single-page editing paradigm as currently provided is sufficiently similar that we can lift+shift for now, and then possibly re-factor later over the next few years (e.g. logging of hit rates, thresholds for activity, more complex response options than pass/fail, CheckUser integration, etc.) if there's demand.

In T279275#6971526, @Jdforrester-WMF wrote:

In T279275#6971523, @Billinghurst wrote:

Would like to understand to what "functionality" refers.

Their existence.

The blacklists/whitelists as lists are very simple, abusefilters are not so.

AF allows much more control, that's true, but to my mind providing a simple single-page editing paradigm as currently provided is sufficiently similar that we can lift+shift for now, and then possibly re-factor later over the next few years (e.g. logging of hit rates, thresholds for activity, more complex response options than pass/fail, CheckUser integration, etc.) if there's demand.

I suppose that I am seeing differences with globality. The blacklists are universal for WMF, though implementation of AFs while being universal, the checks are not global in impact (we have global AF that do not target large wikis). So we have some "language" issues to address.

Noting that there would need to be significant tuning work on the logging as something like Special:Abuselog for global AF is bad enough as it is without including blacklist hits which are currently only locally logged.

Also there is still the issue that "title blacklist" is not logged locally and globally, and that has upsides and downsides.

(Of course, all that is detail and probably does not belong here at the top level, just what my brain is contemplating on the immediate.)

Beetstra subscribed.Apr 5 2021, 1:52 AM

Izno subscribed.Apr 5 2021, 2:15 AM

In my opinion AF is intended to be deployed to all wikis even including larger one like enwiki, but wikis may choose to opt out some specific filters or opt out all by default and opt in specific one (both are currently not possible, see T45761: Allow local disabling of global AbuseFilters), based on local consensus - otherwise the list of wikis to opt-out is very random as large wikis are not always more active ones (they are only large in database size).

TitleBlacklist and SpamBlacklist currently use a wiki page to store their contents. Eventually they should be switched to databases (performance should be considered);

Issues that may be solved easier - T38940, T6459, T14963, T27524(*), T75417, T216803
Issues that may be closed - T209806
(*) Most SBL items may be converted to a linksearch-like syntax (org.wikipedia.en/...)

Bugreporter added a subtask: T47747: Create a tool like SpamBlacklist in AbuseFilter.Apr 5 2021, 7:10 AM

Tgr subscribed.Apr 5 2021, 11:32 AM

taavi subscribed.Apr 6 2021, 8:27 AM

Another issue that may be benefited from this change is T241440: Allow private blocking of harassment via regexes and URLs on-wiki.

Bugreporter mentioned this in T279476: Move title blacklist rules to a database table.Apr 6 2021, 8:10 PM

Bugreporter mentioned this in T279477: Move spam blacklist rules to a database table.Apr 6 2021, 8:15 PM

I'm not sure what this task is proposing. Technically the functionality from spam/title blacklist already exists in AbuseFilter, it would be trivial to write a filter which blocks certain links from being added or pages from being created with certain titles.

That aside, I'm not convinced this is the best approach. We have tens of thousands of blacklisted URLs across Wikimedia projects. That's not feasible to include in one filter, nor is it feasible to create individual filters for each URL. The functionality we need to block spam URLs is relatively limited (though there's certainly room to expand on the current all-or-nothing approach), whereas AbuseFilter is a deeply customisable toolset with far too much going on for the relatively simple task of blocking certain URLs.

In T279275#6979981, @Samwalton9 wrote:

I'm not sure what this task is proposing. Technically the functionality from spam/title blacklist already exists in AbuseFilter, it would be trivial to write a filter which blocks certain links from being added or pages from being created with certain titles.

Yes, hence "the functionality", not "the equivalent functionality". The latter already exists.

Jdlrobson added a parent task: T281536: Schema:EditAttemptStep uses non-inclusive language..Apr 29 2021, 9:49 PM

Change 692740 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/AbuseFilter@master] [WIP] Import the SpamBlacklist and TitleBlacklist extensions as "SimpleList"

https://gerrit.wikimedia.org/r/692740

gerritbot added a project: Patch-For-Review.May 18 2021, 10:54 PM

Change 692749 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] Zuul: [mediawiki/extensions/AbuseFilter] Add Scribunto & EventLogging deps

https://gerrit.wikimedia.org/r/692749

I don't think merging the extensions is a good idea. AbuseFilter is already incredibly complex (for good reason). SpamBlacklist and TitleBlacklist are both straightforward to use (a list of regexes), work out of the box, bundled extensions. AbuseFilter still isn't eligible for bundling yet.

That said, if the goal of this ticket is to re-create Phalanx I'm all for it.

Change 692749 merged by jenkins-bot:

[integration/config@master] Zuul: [mediawiki/extensions/AbuseFilter] Add Scribunto & EventLogging deps

https://gerrit.wikimedia.org/r/692749

Mentioned in SAL (#wikimedia-releng) [2021-05-19T16:42:28Z] <James_F> Zuul: [mediawiki/extensions/AbuseFilter] Add Scribunto & EventLogging deps T279275

In T279275#7097145, @Legoktm wrote:

I don't think merging the extensions is a good idea. AbuseFilter is already incredibly complex (for good reason). SpamBlacklist and TitleBlacklist are both straightforward to use (a list of regexes), work out of the box, bundled extensions.

SpamBlacklist and TitleBlacklist are both limited in what they can do. They have terrible interfaces. They don't get integrated with new action types. The complexity of AF is optional and not required.

AbuseFilter still isn't eligible for bundling yet.

We'll be bundled by the time 1.37 ships.

In T279275#7097145, @Legoktm wrote:

I don't think merging the extensions is a good idea. AbuseFilter is already incredibly complex (for good reason). SpamBlacklist and TitleBlacklist are both straightforward to use (a list of regexes), work out of the box, bundled extensions. AbuseFilter still isn't eligible for bundling yet.

That said, if the goal of this ticket is to re-create Phalanx I'm all for it.

I mostly second this comment, but I feel the need to expand on a point in particular. Technically speaking, AbuseFilter should already be capable of everything that Spam/TitleBlacklist can do. The difference is that AbuseFilter is much more complex, in that it allows splitting rules into filters, adding lots of conditions and different consequences, all with a visual interface (not with things like <noedit | autoconfirmed |errmsg=titleblacklist-custom-msg>). TB/SB are probably meant to be a lightweight alternative that doesn't require to learn a scripting language and code fine-grained checks. I think merging everything might be fine, but ideally we'd want to do a bit more than just combine the code.

It's also unclear how the SP/TB code would integrate with AF. E.g. how would they interact with the DB schema? Just migrating the special pages as they are doesn't seem useful. The other possibility I can think of is having a "special" filter for the TB (same for the SB). But then I don't think we'd have to import any code, as this can already be done on-wiki. Another thing to keep in mind is that TB/SB have everything in a single page, and each regex can specify what consequences should be taken. This cannot be preserved in AF, i.e. every filter has a fixed set of consequences.

Long story short, I think having a single, centralized tool might be a good idea, but I currently can't think of a way that makes sense.

In T279275#7098977, @Daimona wrote:

In T279275#7097145, @Legoktm wrote:

I don't think merging the extensions is a good idea. AbuseFilter is already incredibly complex (for good reason). SpamBlacklist and TitleBlacklist are both straightforward to use (a list of regexes), work out of the box, bundled extensions. AbuseFilter still isn't eligible for bundling yet.

That said, if the goal of this ticket is to re-create Phalanx I'm all for it.

I mostly second this comment, but I feel the need to expand on a point in particular. Technically speaking, AbuseFilter should already be capable of everything that Spam/TitleBlacklist can do. The difference is that AbuseFilter is much more complex, in that it allows splitting rules into filters, adding lots of conditions and different consequences, all with a visual interface (not with things like <noedit | autoconfirmed |errmsg=titleblacklist-custom-msg>). TB/SB are probably meant to be a lightweight alternative that doesn't require to learn a scripting language and code fine-grained checks. I think merging everything might be fine, but ideally we'd want to do a bit more than just combine the code.

It's also unclear how the SP/TB code would integrate with AF. E.g. how would they interact with the DB schema? Just migrating the special pages as they are doesn't seem useful. The other possibility I can think of is having a "special" filter for the TB (same for the SB). But then I don't think we'd have to import any code, as this can already be done on-wiki. Another thing to keep in mind is that TB/SB have everything in a single page, and each regex can specify what consequences should be taken. This cannot be preserved in AF, i.e. every filter has a fixed set of consequences.

Long story short, I think having a single, centralized tool might be a good idea, but I currently can't think of a way that makes sense.

Per my commit message, I was thinking of phases:

Move the current functionality into the repo as-is (this task)
Change the editing experience into a visual editing experience that's simpler than learning regex or scripting language (T6459)
Change the storage from a simple page into a DB table (T279476 and T279477)

At that point, we'd have the ability to fuse the different sources of Filters into different types of filter with different abilities, whilst being consistent about e.g. Unicode normalisation, or triggering actions, or so on.

In T279275#7098935, @Jdforrester-WMF wrote:

In T279275#7097145, @Legoktm wrote:

I don't think merging the extensions is a good idea. AbuseFilter is already incredibly complex (for good reason). SpamBlacklist and TitleBlacklist are both straightforward to use (a list of regexes), work out of the box, bundled extensions.

SpamBlacklist and TitleBlacklist are both limited in what they can do.

This is sometimes a feature, but yes. I think Wikimedia still doesn't have fully global AbuseFilters but SpamBlacklist and TitleBlacklist are fully global.

They have terrible interfaces. They don't get integrated with new action types. The complexity of AF is optional and not required.

Agreed on this. I just don't see how wholesale moving the code into the AbuseFilter repo is a good idea on how to fix these problems. I think it would be better to do an analysis of the features of each extension, figure out how they integrate with AF, and then add that functionality...not just copy code around.

+1 to everything Daimona said.

In T279275#7098977, @Daimona wrote:

Long story short, I think having a single, centralized tool might be a good idea, but I currently can't think of a way that makes sense.

I never used Phalanx but the big limitation of SB/TB is you can't add additional conditions based on the regex nor can you pick other consequences besides disallow. So it would be nice if a title was warning only for < 50 edits users. Or something. And the AbuseLog is way more rich than the very limited SpamBlacklist log we have (and that's a recentish thing too). But managing giant regexes in AbuseFilter is a pain, so SB gets plenty of use that way. I say this as a person who was deeply involved in AF/SB/TB around 2013-2016 but hasn't done much since, so it's possible I'm out of date!

Justin_C_Lloyd subscribed.Sep 8 2021, 11:37 AM

Yardenack subscribed.Dec 17 2021, 4:41 AM

Jdforrester-WMF closed subtask T191740: Bundle AbuseFilter extension with MediaWiki as Resolved.Mar 4 2022, 10:32 PM

What is the functionality unavailable in AbuseFilter that would make it on par with *Blacklist functionality? Off the top of my head these come to mind:

a function to take a (possibly foreign) wiki page with a list of regexes and match the other input against it (optionally case-insensitively)
tagging of the regex list entires with user rights and such - that can be replaced with a separate regexlist page for every flag, for some loss of usability
intelligent error reporting (I want to know which regex matched)
performance (filters are disabled if they match too often, which is not ideal for an anti-spam feature)

The first seems easy to do, the others not so much. I agree that moving the current code/funcionality into AbuseFilter as-is doesn't seem useful.

In T279275#7911413, @Tgr wrote:

What is the functionality unavailable in AbuseFilter that would make it on par with *Blacklist functionality? Off the top of my head these come to mind:

a function to take a (possibly foreign) wiki page with a list of regexes and match the other input against it (optionally case-insensitively)

tagging of the regex list entires with user rights and such - that can be replaced with a separate regexlist page for every flag, for some loss of usability

intelligent error reporting (I want to know which regex matched)

performance (filters are disabled if they match too often, which is not ideal for an anti-spam feature)

The first seems easy to do, the others not so much. I agree that moving the current code/funcionality into AbuseFilter as-is doesn't seem useful.

"Move everything as-is" means you don't have to do an endless consultation with 1000 wikis' worth of sysops as nothing changes for them; giving them the ability to slowly migrate to proper Filters after as-is, without breaking existing workflows, is exceptionally valuable.

Moving everything as-is messes up git histories, it makes the organization of the code less logical, and it is functionally equivalent to doing nothing, so doing nothing seems preferable to me.

Pppery edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.Mar 14 2023, 1:32 AM

Jdforrester-WMF added a subscriber: Ladsgroup.May 2 2023, 8:05 PM

Bugreporter mentioned this in T337431: Rework MediaWiki:SpamBlacklist.May 25 2023, 5:42 AM

Change 922929 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@master] Introduce Special:BlockedExternalDomains

https://gerrit.wikimedia.org/r/922929

gerritbot added a project: Patch-For-Review.May 30 2023, 6:48 PM

Restricted Application removed a project: Patch-Needs-Improvement. · View Herald TranscriptMay 30 2023, 6:48 PM

Change 922929 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Introduce Special:BlockedExternalDomains

https://gerrit.wikimedia.org/r/922929

ReleaseTaggerBot added a project: MW-1.41-notes (1.41.0-wmf.12; 2023-06-06).May 31 2023, 7:00 PM

matej_suchanek added a subtask: T337431: Rework MediaWiki:SpamBlacklist.Jun 1 2023, 8:46 AM

Change 925767 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/AbuseFilter@master] BlockedExternalDomains: Make this a special right, prohibit direct editing

https://gerrit.wikimedia.org/r/925767

Change 925767 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] BlockedExternalDomains: Make this a special right, prohibit direct editing

https://gerrit.wikimedia.org/r/925767

Change 929167 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/AbuseFilter@master] Fix broken error reporting in BlockedExternalDomains

https://gerrit.wikimedia.org/r/929167

Change 929167 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Fix broken error reporting in BlockedExternalDomains

https://gerrit.wikimedia.org/r/929167

Change 929359 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/AbuseFilter@master] Fix error reporting in BlockedDomainStorage for real

https://gerrit.wikimedia.org/r/929359

ReleaseTaggerBot edited projects, added MW-1.41-notes (1.41.0-wmf.13; 2023-06-13); removed MW-1.41-notes (1.41.0-wmf.12; 2023-06-06).Jun 12 2023, 3:00 PM

Change 929359 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Fix error reporting in BlockedDomainStorage for real

https://gerrit.wikimedia.org/r/929359

Pppery edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.Jun 23 2023, 3:47 PM

Change 692740 abandoned by Jforrester:

[mediawiki/extensions/AbuseFilter@master] [WIP] Import the SpamBlacklist and TitleBlacklist extensions as "SimpleList"

Reason:

Done by Amir instead.

https://gerrit.wikimedia.org/r/692740

Pppery removed a project: Patch-Needs-Improvement.Nov 12 2023, 10:00 PM

Jdforrester-WMF mentioned this in T224921: Code Stewardship Review: SpamBlacklist.Sep 24 2024, 10:48 PM

Novem_Linguae subscribed.Sep 24 2024, 10:50 PM

Move all the functionality of {Spam,Title}Blacklist extensions into AbuseFilter and retire themOpen, Needs TriagePublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Move all the functionality of {Spam,Title}Blacklist extensions into AbuseFilter and retire them
Open, Needs TriagePublic
Actions

Related Objects
Search...