Page MenuHomePhabricator

Rework MediaWiki:SpamBlacklist
Open, Needs TriagePublic

Assigned To
Authored By
Ladsgroup
May 24 2023, 8:06 PM
Referenced Files
F37113054: grafik.png
Jun 22 2023, 7:39 PM
F37110064: Screenshot_20230620_092602_Firefox.png
Jun 20 2023, 6:32 AM
F37026631: grafik.png
May 24 2023, 8:06 PM
F37026633: grafik.png
May 24 2023, 8:06 PM
Tokens
"100" token, awarded by MusikAnimal."Love" token, awarded by Quiddity."Like" token, awarded by sbassett."Orange Medal" token, awarded by ppelberg."Love" token, awarded by Krinkle."Doubloon" token, awarded by Esanders."Like" token, awarded by ToBeFree.

Description

The current implementation of MediaWiki:SpamBlacklist could use some improvements:

  • It's a long set of regexes 6000-ish to be exact (on English Wikipedia)
    • Running every regex match on every edit is not very fast
    • It's hard to search given the escaping
    • Most of them don't need to be regex either
    • Since most admins are not familiar with regex, it limits ability of admins to fight spam
    • It's fragile, it can easily cause everyone in the wiki not to be able to save any edit adding any link.
  • It's not structured
  • It can't tell you exactly why your edit can't be saved (which domain is blocked)
  • It doesn't have notes so people have to log it separately
  • The naming is problematic (T254646)

Proposal:

  • Create a new special page called Special:BlockedExternalDomains, editable by anyone having "delete" right
  • Make it save into a json page called MediaWiki:BlockedExternalDomains.json
  • Store everything in Extension:AbuseFilter
  • Put it behind a feature flag and deploy it to a couple of pilot wikis
  • Deploy it widely and send a message to admins noticeboard of wikis to migrate from the current system to the new one
  • (out of scope of this ticket) After migration is done, move more features to Extension:Abusefilter, such as keeping regrex-based denylist and allow list as a mediawiki page, email denylist, global denylist (both regex and non-regex), and so on.
  • (out of scope of this ticket) Undeploy SpamBlacklist extension (after migration of some other functionalities)

Mocks:

grafik.png (791×1 px, 130 KB)
(admin view)
grafik.png (1×3 px, 153 KB)
(non-admin view)

Test setup:
https://en.wikipedia.beta.wmflabs.org/wiki/Special:BlockedExternalDomains

Python script to migrate off MediaWiki:Spamblacklist for simple cases: P49299

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/AbuseFiltermaster+204 -8
mediawiki/extensions/AbuseFiltermaster+177 -119
mediawiki/extensions/AbuseFiltermaster+3 -13
mediawiki/extensions/AbuseFiltermaster+10 -40
operations/mediawiki-configmaster+7 -3
mediawiki/extensions/AbuseFilterwmf/1.41.0-wmf.13+1 -1
mediawiki/extensions/AbuseFilterwmf/1.41.0-wmf.13+3 -1
mediawiki/extensions/AbuseFiltermaster+1 -1
mediawiki/extensions/AbuseFiltermaster+3 -1
operations/mediawiki-configmaster+2 -0
mediawiki/extensions/AbuseFilterwmf/1.41.0-wmf.13+69 -3
mediawiki/extensions/AbuseFiltermaster+69 -3
mediawiki/extensions/AbuseFiltermaster+20 -25
mediawiki/extensions/AbuseFiltermaster+17 -15
mediawiki/extensions/AbuseFiltermaster+133 -9
mediawiki/extensions/AbuseFiltermaster+3 -2
mediawiki/extensions/AbuseFiltermaster+11 -2
mediawiki/extensions/AbuseFiltermaster+31 -5
mediawiki/extensions/AbuseFiltermaster+4 -0
mediawiki/extensions/AbuseFiltermaster+66 -2
mediawiki/extensions/AbuseFiltermaster+24 -4
mediawiki/extensions/AbuseFiltermaster+72 -9
mediawiki/extensions/AbuseFiltermaster+30 -8
operations/mediawiki-configmaster+8 -0
mediawiki/extensions/AbuseFiltermaster+789 -5
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 930721 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@master] blocked domains: Make sure users can't bypass the list by user uppercase

https://gerrit.wikimedia.org/r/930721

Is this ready to test on testwiki? I tried to add "spamsite.com" there, and got "Save failed".

Never mind, seems to work if I create the page first.

Change 930721 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] blocked domains: Make sure users can't bypass the list by using uppercase

https://gerrit.wikimedia.org/r/930721

Idea: The special page should reverse the order, the most recent addition on the top. Objections?

Idea: The special page should reverse the order, the most recent addition on the top. Objections?

AbuseFilter is oldest->newest, so it feels to me like this should follow the same ordering?

Fair, I leave it as is, given that it's also how Spam blacklist works.

Change 930903 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@master] Blocked domains: Fix removing a domain via the special page

https://gerrit.wikimedia.org/r/930903

Change 930903 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Blocked domains: Fix removing a domain via the special page

https://gerrit.wikimedia.org/r/930903

Change 931066 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@wmf/1.41.0-wmf.13] blocked domains: Make sure users can't bypass the list by using uppercase

https://gerrit.wikimedia.org/r/931066

Change 931066 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@wmf/1.41.0-wmf.13] blocked domains: Make sure users can't bypass the list by using uppercase

https://gerrit.wikimedia.org/r/931066

Change 931067 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@wmf/1.41.0-wmf.13] Blocked domains: Fix removing a domain via the special page

https://gerrit.wikimedia.org/r/931067

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:02:19Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:931066|blocked domains: Make sure users can't bypass the list by using uppercase (T337431)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:03:41Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:931066|blocked domains: Make sure users can't bypass the list by using uppercase (T337431)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:12:12Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:931066|blocked domains: Make sure users can't bypass the list by using uppercase (T337431)]] (duration: 09m 53s)

Change 931067 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@wmf/1.41.0-wmf.13] Blocked domains: Fix removing a domain via the special page

https://gerrit.wikimedia.org/r/931067

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:21:36Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:931067|Blocked domains: Fix removing a domain via the special page (T337431)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:22:57Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:931067|Blocked domains: Fix removing a domain via the special page (T337431)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

Change 931231 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Enable new spam block page in all wikis except meta, commons, wikidata

https://gerrit.wikimedia.org/r/931231

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:30:01Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:931067|Blocked domains: Fix removing a domain via the special page (T337431)]] (duration: 08m 24s)

Change 931231 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable new spam block page in all wikis except meta, commons, wikidata

https://gerrit.wikimedia.org/r/931231

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:32:38Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:931231|Enable new spam block page in all wikis except meta, commons, wikidata (T337431)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:33:58Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:931231|Enable new spam block page in all wikis except meta, commons, wikidata (T337431)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-19T09:43:23Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:931231|Enable new spam block page in all wikis except meta, commons, wikidata (T337431)]] (duration: 10m 45s)

If I create the json page, I get blocked with the error message "JSON should be an array". There is "{}" in the edit box by default, I did not add to that.

You need to replace the {} with what you're adding, if you want to create an empty one, create it with [] instead (it doesn't matter though, if the page doesn't exist, it just pretends it's empty)

I think we define BlockedExternalDomains.json as an i18n message with content [] so that the link by default doesn't display in red?

I don't know how json pages could interact with i18n infra, might work, might not. Have you seen an example like this before done somewhere else?

Yes, https://codesearch.wmcloud.org/deployed/?q=%5C.json%22%3A+%22&files=en.json%24

  • "citoid-template-type-map.json": "null",
  • "visualeditor-cite-tool-definition.json": "null",
  • "visualeditor-quick-access-characters.json": "null",
  • "visualeditor-template-tools-definition.json": "null",
  • "realme-config.json": "{}"

Thanks. Let's make a patch for it then. I'll try to do it after some stuff if no one beats me to it.

I might have missed some questions and answers, but let me ask my own:

  1. basing on the description it is not clear for me what happens with global spam blacklist. I see it mentioned that spamwhitelist won't be implemented as of yet. Does that mean that global spamblacklist stays only as a regex page on meta, or the meta special page *will* coëxist with the regex page, and for now a regex page would be used to whitelist both?
  2. I see that as an alternative long term solution to more complex cases abuse filter is proposed. as far as I am aware abuse filter does not allow to provide a customised error message that will tell you what exactly domain has triggered it, unless you create a new filter per each domain/pattern, which is an obvious maintenance nightmare. Spamblacklist on the otherhand clearly tells user what domain is triggering it. Is there a plan to enhance abuse filter with this functionality or it is decided that people who add references should just figure out which of the sometimes over a hundred references they add is causing a problem themselves?

Funnily enough the description actually claims that the current system

It can't tell you exactly why your edit can't be saved (which domain is blocked)

But it is not so:

Screenshot_20230620_092602_Firefox.png (2×1 px, 377 KB)

The error message clearly tells you which of the domains is triggering the spamblacklist.

  1. is there a timeline for when we should expect spamblacklist regex page gone?
  2. are there plans to likewise dismantle title, email, and global rename request, and other blacklists and then also whitelists?
  3. are there some concrete metrics on how much speed we will be saving on each save by using an improved system?
  4. by replacing the plaintext storage with a json based I assume that each line will take a bit more space. Are we safe from potentially hitting max size with that json page?

I think that, given this blacklist does not support a whitelist or regexes, which are really common use cases for this feature, it is (and shouldn’t be) under anyone’s plans to replace the blacklist entirely.

These information won't be lost, they are simply in the history. I'm quite hesitant to add $performer there because that would make them a very obvious target (at least now they need to search in history). Regarding timestamp of adding, that's a bit complicated because timestamp of saving the edit is not known when writing to the json and you have a chicken and egg problem. Not to mention the fun related to timezones. We can add it later if there is a huge demand for it but I rather keep it simple for the first iteration.

Maybe two renders of the page can be possible, one for admins and one for regular users?

I might have missed some questions and answers, but let me ask my own:

  1. basing on the description it is not clear for me what happens with global spam blacklist. I see it mentioned that spamwhitelist won't be implemented as of yet. Does that mean that global spamblacklist stays only as a regex page on meta, or the meta special page *will* coëxist with the regex page, and for now a regex page would be used to whitelist both?

The plan is basically to have two new json pages (but possibly without interface like the simple one we deployed yesterday): One would hold denylist as set of regexes, one as allowlist as a set of regexes and allow list would be checked against regex and simple denylist. This is not deployed to meta yet. We need to build the support for global deny and allow lists but I'm doing this in my volunteer capacity. So be patient. I'm not planning to take away any functionality. Just splitting the most case out to a more user-friendly interface.

  1. I see that as an alternative long term solution to more complex cases abuse filter is proposed. as far as I am aware abuse filter does not allow to provide a customised error message that will tell you what exactly domain has triggered it, unless you create a new filter per each domain/pattern, which is an obvious maintenance nightmare. Spamblacklist on the otherhand clearly tells user what domain is triggering it. Is there a plan to enhance abuse filter with this functionality or it is decided that people who add references should just figure out which of the sometimes over a hundred references they add is causing a problem themselves?

Funnily enough the description actually claims that the current system

It can't tell you exactly why your edit can't be saved (which domain is blocked)

But it is not so:

Screenshot_20230620_092602_Firefox.png (2×1 px, 377 KB)

The error message clearly tells you which of the domains is triggering the spamblacklist.

That is correct. My hope was/is that regex would be small enough not to need this. For example, my home wiki has maybe a couple out of 1500 that are actual regexes and not knowing what url triggered the filter is an acceptable loss in my home wiki but it's not in enwiki so I'm planning to keep the regex functionality around as a different json page (let's call it MediaWiki:BlockedExternalDomainsRegex.json) which would hold any compelx regexes.

That's out of scope of this ticket. It's part of the parent ticket.

  1. is there a timeline for when we should expect spamblacklist regex page gone?

No, I'm doing this in volunteer capacity and it'll take as long as it takes to be done.

  1. are there plans to likewise dismantle title, email, and global rename request, and other blacklists and then also whitelists?

dismantle? I'm not taking away any functionality and I know how important these are. My plan is to just make it easier for admins to carry out the most common type of spam fighting (simple domain). For anything else, the list pages will be renamed to a more inclusive name and become json (to support notes) but otherwise it's just moving code from one repo to another.

  1. are there some concrete metrics on how much speed we will be saving on each save by using an improved system?

Yes. For fawiki with 1500-ish spam blocked domains we made every edit ~49ms faster: T337431#8936498

  1. by replacing the plaintext storage with a json based I assume that each line will take a bit more space. Are we safe from potentially hitting max size with that json page?

JSON takes a bit more space but not much. fawiki's 1500 list is 220KB and even if/when we reach that limit, we can try to exempt that page or do some clean ups (e.g. lots of these domains don't resolve anymore, if there is no hit log in the past five years and they don't' resolve, maybe it's safe to remove some, depending on the wiki's admin). Theoretically speaking that is also a concern for the old page too. We could also split the notes to another page if needed. Worst case, we will turn this into a data table. Will see what we can do once we get there. (Also noting that with deployment StopForumSpam extension, we might be able to get rid of some?)

I think that, given this blacklist does not support a whitelist or regexes, which are really common use cases for this feature, it is (and shouldn’t be) under anyone’s plans to replace the blacklist entirely.

This never was my intention to make this page completely replace spamblacklist. My goal is move the most common usecases to an easier and simple and faster system and for more complex ones, we just move the code around (with some clean ups and jsonifying)

Change 930250 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] BlockedExternalDomains: Optimize host extraction by using parse_url

https://gerrit.wikimedia.org/r/930250

Change 930710 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] BlockedExternalDomains: De-duplicate validateDomain logic

https://gerrit.wikimedia.org/r/930710

Re: Tech News - Thanks for the link, but that's a lot of context for me to attempt to condense into 2-3 simple sentences! (Here's a permalink to the announcement and followup comments)
I wonder if this new feature deserves a canonical (and translatable) documentation page (or section), on either mediawiki-wiki or meta-wiki? (I'd guess it belongs in https://www.mediawiki.org/wiki/Extension:AbuseFilter#Creating_and_managing_filters somewhere?)
If I understand correctly this is an optional (but significant improvement) change, so it isn't urgent, and therefore I won't attempt to hurry-along the documentation nor add an entry to Tech News this week, but will wait for the documentation to be ready. Thanks!

Fair. I try to write some thing soon. For the target of it, I added a link to a manual page:

grafik.png (1×1 px, 300 KB)

but it doesn't exist yet. I try to edit and add as much as possible.

@Ladsgroup thank you! I've added it to https://meta.wikimedia.org/wiki/Tech/News/2023/28 -- please check that wording, and correct it directly if needed (within the next ~24 hours). I've also marked the Manual page for translation.

As a person with interface-admin rights, I don't have access to the UI but can edit JSON directly. To me, it looks like a logical error. Probably everybody who can edit JSON should be able to use this UI. Also, there could be some additional roles.

As a person with interface-admin rights, I don't have access to the UI but can edit JSON directly. To me, it looks like a logical error. Probably everybody who can edit JSON should be able to use this UI. Also, there could be some additional roles.

Yes and no. As an interface-admin you should be able to edit js/json/css of the whole site but since you're not admin, you can't use the interface admins use to ban websites.

The biggest logical weirdness here tbh is ability to be interface-admin without being admin. A user is trusted enough to deal with site-wide js and css (that could cause major issues) but not trusted enough to do a deletion? We had similar issues in other wikis that CUs couldn't be admins which led to a lot of problems (to say the least). I know some wikis want to operate like that, but with that assumption, you might end up with weird stuff like this.

I can explicitly add the check to take away the right to edit if the user can't modify via special page. I'll do it later this week.

Bikesheding time. What would be the name of replacement for Spam-blacklist and whitelist replacements?

  • Spam-blacklist: BlockedExternalLinksRegex.json?
  • Spam-whitelist: BypassExternalLinksBlock.json?

I'd generally recommend the terms "block" and "allow". ("bypass" means "don't drive thru town but pass by" in my brain.)

"bypass" means "don't drive thru town but pass by" in my brain.

Yes, that's probably why Ladsgroup suggested it. Our default state is allowed (vice the typical case for a whitelist, where things are by default disallowed). Then the blacklist blocks attempts to add a certain set of links. Thus we need a way to bypass the blacklist.

Either way, it is definitely bikeshedding.

That said, if there is appetite to do/think about T203157: Make the spam whitelist its own slot (some discussion on its parent T14963 to consider) in this context, that would eliminate most of the need for such a singular page (but I don't think all of it, for example I see a regex \bwww\.google\.com/cse\?cx=009114923999563836576%3A1eorkzz2gp4 which goes to what looks like a custom search engine for reliable sources; I know there are other such domain-specific searches that are commonly linked).

Capturing here from #mediawiki-core chat with @Ladsgroup:

The SpamBlacklist extension has been bundled with MediaWiki core for many years, and is enabled by default to consume https://meta.wikimedia.org/wiki/Spam_blacklist from Meta-Wiki, and there's an advertised configuration that additionally consumes https://en.wikipedia.org/wiki/MediaWiki:Spam-blacklist. As such, it is important for the many thousands of MediaWiki installs worldwide to 1) keep protecting existing installs from spam, and 2) ensure the new and upgraded MediaWiki installs remain similarly protected.

For existing installs, this means:

  • we must not slowly migrate or otherwise remove entries from Meta-Wiki and EnWp block lists. Existing entries should remain as-is to support existing MediaWiki 1.39 and earlier installs that use SpamBlacklist. Instead, we'll have to wait a few weeks to migrate these wikis until AbuseFilter has the new regex capability for domain blocking ready and deployed, and then do a big switch that creates both domain and regex entries via AbuseFilter at once for these, and then shortly after disable the SpamBlacklist extension on those wikis, but preserving the two pages above as-is in a frozen/protected state. (Still allowing admins to the occasional edit to remove bad entries.)

For new/upgraded installs, this means

  • AbuseFilter (already bundled by default with MediaWiki) needs one more capability: namely to load an extra .json page from a URL, i.e. action=raw for https://meta.wikimedia.org/wiki/MediaWiki:BlockedExternalDomains.json,
  • the domain block feature, and this URL, are enabled by default in the AbuseFilter extension.
  • the SpamBlacklist extension somehow gets disabled on-upgrade (TBD how).

For the last point, there's a few different options. There is no uninstall concept for bundled extensions, and there's always an ambiguity with whether a specific installs does or doesn't want to keep the extension. The closest we have, is to empty-out the extension repo, and then after 2 LTS cycles (matching max DB upgrade distance), remove it from the bundle. The benefit of this is that we cleanly stop supporting the extension and focus efforts on AbuseFilter. The downside is that if a wiki has its own local blocklist configured, that will implicitly get turned off on-upgrade until the admin notices and realizes and then finds out how to convert to the new system, with no obvious way to prepare for this ahead of time. The good news is that, because everything is simply a wiki page, the admin would have a way to prepare for it: create MediaWiki:BlockedExternalDomains.json with the relevant entries ahead of time.

If there's a conversion tool that we're reasonably confident in, perhaps that's something AbuseFilter could automatically run on-upgrade if it detects a non-empty MediaWiki:Spam-blacklist page.

Short of that, we can issue a mediawiki-l announcement, and perhaps instead of emptying out the SpamBlacklist extension completely, we could reduce it to detecting on-edit when MediaWiki:Spam-blacklist is non-empty and BlockedExternalDomains.json doesn't exist yet, and then issue a runtime warning with link to migration docs.

I expect there will be overlap of these pages for quite some time, not sure "if exists" type checks will be the right answer to push on someone else.

@Xaosflux For enwiki/metawiki, the proposed strategy does not involve overlap. We would switch the entries within this wiki all at once. The main reason is that third-party installs only have SpamBlacklist today, and thus its config page must remain complete, not incomplete as part of a transition. Other WMF wikis indeed are free to start their migration any time and have the two overlap and co-exist.

Do you foresee potential issues with performing the migration in this way?

I mention "if exists" as a way to determine whether to send a warning to the system logs (e.g. error.log file, syslog, Logstash equivalent) on third-party systems that run MW 1.40+ starting later this year. If they do have a custom MediaWiki:Spam-blacklist page from MW <= 1.39, but not not yet have a local BlockedExternalDomains.json page for MW 1.40+, we can let them know that the old system no longer works, and that they can use this instead.

The idea is that we don't need to bother sysadmins that have a plain install with no customisations, and we also don't need to bother sysadmins that already know about and have started using BlockedExternalDomains.json. The error.log warning would essentially state something along the lines of "You have a MediaWIki:Spam-blacklist page that we no use. Please migrate to Special:BlockedExternalDomains". I think once they have created at least 1 entry in the new system, we can stop appending the same message on every edit to the error.log file on their server.

@Krinkle - isn't there still a gap preventing full migration, namely that there is no whitelist; for example there are domain on the SBL, and subdomains on the SWL. During some onwiki discussion migrating "easy" cases was already proposed in https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_206#Migrating_MediaWiki:Spam-blacklist_to_MediaWiki:BlockedExternalDomains.json

Follow on, regarding the GSBL on meta-wiki, lack of local whitelist should be a hard block - especially if this is going to apply to world-wide non-wmf projects.

@Krinkle - isn't there still a gap preventing full migration, namely that there is no whitelist; […]

There can't be a gap as explained above. instead, before we start on enwiki and metawiki, need to wait until the feature is ready.

Quoting @Ladsgroup and myself above:

The plan is basically to have two new json […]: One would hold denylist as set of regexes, one as allowlist as a set of regexes […] We need to build the support for global deny and allow lists but I'm doing this in my volunteer capacity. So be patient. I'm not planning to take away any functionality. […]

[…] Instead, we'll have to wait a few weeks to migrate these wikis until AbuseFilter has the new regex capability for domain blocking ready and deployed, and then do a big switch […]

Change 945930 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@master] BlockedDomains: Move filtering logic to a dedicated class

https://gerrit.wikimedia.org/r/945930

Change 945930 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] BlockedDomains: Move filtering logic to a dedicated class

https://gerrit.wikimedia.org/r/945930

Change 950212 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/AbuseFilter@master] [WIP] BlockedDomain: Add support for regex and bypass

https://gerrit.wikimedia.org/r/950212

I made a lot of progress on adding support for regex and bypass (and bypass working for both blocked domains and regexes): https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/950212

Unlike BlockedExternalDomains in its current shape, SpamBlacklist checks edit summaries for spam, too (T15599#173812, code snippet: T296102#9534553).

Unrelated to that, I also came across T229709: Automatically block users who hit some spam blacklist entries. It could now be implemented in Abuse filter since the tools are already there.