Page MenuHomePhabricator

Allow $wgSFSIPListLocation to be a url and have proxy support
Closed, ResolvedPublic

Description

You can download IP blacklists, and import them using the maintenance/updateBlacklist.php script. StopForumSpam has several lists; we recommend using the "listed_ip_30_all" list. Once you choose the list you want, download and extract it to somewhere on your server, then point $wgSFSIPListLocation in the LocalSettings.php file at it. We recommend setting up a nightly cron job to download and extract new versions of the list and subsequently running the updateBlacklist maintenance script.

This really doesn't fit into the WMF way of doing things for production (or beta)...

$wgSFSIPListLocation being a url, allowing fetching from there (with proxy support!) is necessary

Event Timeline

Looks like it's just an fopen() call. Not sure if it'd be easier to leave $wgSFSIPListLocation as a local file and set up the cron to pick up whatever SFS files we'd like with something like:

export https_proxy=http://webproxy.eqiad.wmnet:8080
0 0 0 0 0 curl https://www.stopforumspam.com/downloads/listed_ip_365_ipv6.zip -o /path/to/local/file

Or if we need to pull down the daily SFS updates and merge those. These files get fairly large btw.

Proxying to their API is probably a really bad idea.

sbassett triaged this task as Medium priority.Jul 12 2019, 10:24 PM

Proxying to their API is probably a really bad idea.

I don’t mean in real time, just be able to do outbound requests to get the list via our proxies needed for outgoing requests (see what we do in extensions like TorBlock which does the request via cron and shoves it in object cache)

Though, if the file is sufficiently large... we probably don’t want to be putting them in memcached...

Might be worth doing some testing with the size of the resultant object

I assume we'd be interested in the All Site Data files here, namely the IPv4 and IPv6 Combined files. These have the following sizes:

FileSize Compressed (gz or zip)Size Uncompressed (text)
listed_ip_1_ipv4629 Kb95 Kb
listed_ip_7_ipv4689 Kb318 Kb
listed_ip_30_ipv46231 Kb890 Kb
listed_ip_90_ipv46538 Kb2.2 Mb
listed_ip_180_ipv46878 Kb3.7 Mb
listed_ip_365_ipv461.2 Mb5.2 Mb

The daily indicator (1, 7, 30, 90, etc.) is apparently a "last seen active (causing trouble) within X days" reference for the given list of IPs. I'm not sure what the Download Limit column is. I assumed it was some sort of IP-based throttle per file download, but I've been able to download the files multiple times to my local laptop which has a static IP. Anyhow, none of these files are particularly monstrous to download, though there would indeed be concerns about tossing them into a config file or cache. They do seem to be fairly accurate as I found several spammy IPs from the recent attack (T227416) within these lists (I believe @MarcoAurelio did as well.) I've no idea what the false positive rate might be, which is probably something we'd have to test on beta and then maybe a handful of smaller project wikis. @Reedy - when you return, can we get this deployed to beta? (I've never done that before.)

The SFS extension doesn't support IPv6 yet (T173399), but there were very few IPv6 addresses in the blocklist anyways.

I think downloading on-demand is a bit sketchy, and requires the constant uptime of the SFS website. I'd rather have a cronjob that regularly wget's the latest file and unzips it into place. Even if that fails, we'll still have an old blacklist on disk to fallback too.

I think downloading on-demand is a bit sketchy, and requires the constant uptime of the SFS website. I'd rather have a cronjob that regularly wget's the latest file and unzips it into place. Even if that fails, we'll still have an old blacklist on disk to fallback too.

Problem is if/when it falls out of cache, the file is only going to exist on one host and that host can only be the one to repopulate the cache

But as above, I wasn’t suggesting doing it on demand for every request, just like we do for torblock

Update: after chatting w/ @Reedy a bit and having a look at the way ext:TorBlock (which ext:StopForumSpam borrows from in other places) does similar things within TorExitNodes.php, I'd like to model a patch for ext:StopForumSpam on what fetchExitNodesFromTorProject() and loadExitNodes() do by proxying out to an external URL (whichever SFS blacklist we want to use) and dumping it into WANObjectCache. I'd imagine this would also imply some work upon the better IPV6 support mentioned within T212528, particularly around serialization, which may or may not be needed as ext:TorBlock seems to just dump IPs into the cache. I'm guessing it might make sense to have this be a separate mode of operation from the existing code, as it's heavily tied to WMF production, which may not be appropriate for all users of ext:StopForumSpam, possibly controlled by a config variable.

Update: after chatting w/ @Reedy a bit and having a look at the way ext:TorBlock (which ext:StopForumSpam borrows from in other places) does similar things within TorExitNodes.php, I'd like to model a patch for ext:StopForumSpam on what fetchExitNodesFromTorProject() and loadExitNodes() do by proxying out to an external URL (whichever SFS blacklist we want to use) and dumping it into WANObjectCache. I'd imagine this would also imply some work upon the better IPV6 support mentioned within T212528, particularly around serialization, which may or may not be needed as ext:TorBlock seems to just dump IPs into the cache. I'm guessing it might make sense to have this be a separate mode of operation from the existing code, as it's heavily tied to WMF production, which may not be appropriate for all users of ext:StopForumSpam, possibly controlled by a config variable.

https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/StopForumSpam/+/376876/ was moving to using a file on disk to load from.. I abandoned it in favour of this (rebase is a bit of a pig), but it might still have scope

Change 630298 had a related patch set uploaded (by SBassett; owner: SBassett):
[mediawiki/extensions/StopForumSpam@master] StopForumSpam extension improvements

https://gerrit.wikimedia.org/r/630298

Change 630298 merged by jenkins-bot:
[mediawiki/extensions/StopForumSpam@master] StopForumSpam extension improvements

https://gerrit.wikimedia.org/r/630298

Change 655499 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/extensions/StopForumSpam@master] Add outbound proxy support for requests

https://gerrit.wikimedia.org/r/655499

Change 655499 merged by jenkins-bot:
[mediawiki/extensions/StopForumSpam@master] Add outbound proxy support for requests

https://gerrit.wikimedia.org/r/655499