Page MenuHomePhabricator

Antispam filter doesn't filter plaintext rendered URLs
Closed, DeclinedPublic

Description

If http://some.spam.tld on Spam-blacklist, it still lets out following constructions:

  • <nowiki>http://some.spam.tld</nowiki>
  • http&#x3a;//some.spam.tld (etc.)

Version: unspecified
Severity: normal
URL: http://test.wikipedia.org/w/index.php?title=User:Danny_B./Spamfilter&oldid=60480

Details

Reference
bz14522

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:08 PM
bzimport added a project: SpamBlacklist.
bzimport set Reference to bz14522.
bzimport added a subscriber: Unknown Object (MLST).

Isn’t that a feature? Those are not links, therefore they are not blocked. (And why should they be?)

Using the spam blacklist to block plaintext is not a good idea. Many generics are used inside of stuff added to the spam blacklist to stop many of the incoming urls. None of these are valid in urls, however they may consist of valid words in plaintext. If the spam blacklist were to become used for plaintext, then the spamfilter would start acting up everywhere blocking pages which really don't have spam on them.

And quite simply... We already have an extension for blocking plaintext, SpamRegex. The SpamBlacklist is for blocking urls only and is widely editable. SpamRegex is meant for blocking anything, and is more restricted because you can really screw things up if you do things in even the slightest wrong way.

There is no way to block plaintext in the way you want:

  1. The SpamBlacklist extension only looks at parser output not the code, because of that if something has not been converted into a link, it does not know about it. Therefore plaintext cannot be blacklisted.
  2. For things like www.foo.com, while you may recognize them as a url, there is no feasible way to make the computer understand that. At least, without an unacceptable amount of false positives which will make many valid edits trigger the spamfilter. Not to mention, that places normally use the SpamBlacklist's talkpage to post up spam urls to block, and they do it in plaintext. If plaintext were to become blacklisted, every time someone blacklisted a url, the talkpage for requesting backlists would become uneditable because of the new spamfilter addition.

herd wrote:

*** Bug 20501 has been marked as a duplicate of this bug. ***