Page MenuHomePhabricator

Link blocklist / spam blacklist parses URL wrongly
Closed, InvalidPublicBUG REPORT

Description

I was trying to fix a link on https://en.wikipedia.org/wiki/Bezwada_Wilson from http://kafila.org/2010/12/22/why-is-it-so-difficult-to-free-india-of-manual-scavenging/ to https://kafila.online/2010/12/22/why-is-it-so-difficult-to-free-india-of-manual-scavenging/

But the spam blocklist gave me a warning and prevented me. But strangely the warning text said: " The following link has triggered a protection filter: .online/2010/12/22/why-is-it-so-difficult-to-free-india-of-manual-scavenging/ "

Notice the missing "kafila" in the hostname.

First I checked the relevant lists:

None of them contains any malformed regex (let alone kafila.online itself).

So I tried with a domain that's well known - get.online

When I entered just https://get.online, the blacklist didn't get triggered. But when I modified the URL to https://get.online/2010/12/22/why-is-it-so-difficult-to-free-india-of-manual-scavenging/ it got triggered. That's strange because I didn't change the hostname at all. So I confirmed this is a bug with code (and not with a regex).

I then tried removing parts of the URL.

The issue doesn't exist in Malayalam wikipedia. See https://ml.wikipedia.org/wiki/%E0%B4%89%E0%B4%AA%E0%B4%AF%E0%B5%8B%E0%B4%95%E0%B5%8D%E0%B4%A4%E0%B4%BE%E0%B4%B5%E0%B5%8D:Asdofindia/dev/bug-test

Event Timeline

Wiki A has the issue and wiki B doesn't usually tends to point to a specific bad regex. You should probably ask on [[en:MediaWiki talk:Spam-blacklist]] to see if anyone else can figure out what's gone wrong.

I agree. It might be this line
.*\.(ga|cf|ml|gq|online|site)/.*?\d{4,5}[-/]\d{1,2}[-/]\d{1,2}.*

Sorry. Thanks.

The error message threw me off.