Page MenuHomePhabricator

VisualEditor (Citoid) bypasses the Spamblacklist by specifying a port (e.g. 443)
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue

  • blacklist an url, e.g. here
  • add the blacklisted source in VisualEditor (Citoid)
    image.png (259×453 px, 11 KB)

What happens?:

What should have happened instead?:

  • The edit should not be allowed: either the port should not be added, or the Spamblacklist should detect URLs with ports (possibly using an equivalence table).

Event Timeline

Exemple : https://www.bfmtv.com:443/economie/replay-emissions/objectif-croissance/

Port 443 shouldn't be there, it's the port for the HTTPS protocol which isn't mandatory and shouldn't be there in principle.

The regex visible on the screen below should block the url encapsulated in it, but because of the port added automatically it completely breaks the mediawiki:spam-blacklist.

This means that the blacklist can be bypassed at any time, the only solution would be to blacklist the entire site, which is not desirable.

What I've also noticed is that the famous port 443 is only added when a final ‘/’ is added at the end of the URL.

When the URL is added without the final ‘/’ the blacklist is triggered.

You should use BlockedExternalDomains (e.g. https://fr.wikipedia.org/wiki/Sp%C3%A9cial:BlockedExternalDomains) which deconstructs links added making it immune to whole category of such issues.

I don't see the :443 being automatically added? If I put https://www.bfmtv.com/economie/replay-emissions/objectif-croissance/ into Citoid, it correctly warns me that it's blocked, rather than adjusting the URL and bypassing that:

CleanShot 2025-05-13 at 09.23.40@2x.png (734×836 px, 115 KB)

As such, it seems like more of a general SpamBlacklist issue -- since the same could be replicated in a pure wikitext edit.

@Ladsgroup mind you, this specific case of "I want to block one page on a given domain that's being used for spam" isn't supported by BlockedExternalDomains...

@Ladsgroup

Using BlockExternalDomain is not a solution in this case.

If we were to add it, it would block Bfmtv completely and that's not desirable in this case. Here, we're blocking part of the site that has been deemed unreliable because it uses Brand Content.

The regex in SpamBlacklist::getRegexStart() is missing an optional port group. Adding (?::\d+)? after [a-z0-9_\-.]* should fix this.