Page MenuHomePhabricator

Security review for StopForumSpam
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

The StopForumSpam extension allows wikis to utilize and contribute data to stopforumspam.com. It features automatic IP blocking, confidence variables, and an easy interface for administrators to submit data.

Description of how the tool will be used at WMF

According to T125338, we'd like only to use data from SFS to block spambot activities without sending data back to SFS as that'd violate the privacy policy. That is, we'd like this some sort of external IP/IP range blacklist for nasty spammers. The right stopforumspam must be explicitly not assigned on WMF sites.

Dependencies

Don't know.

Has this project been reviewed before?

I don't think so. We did an informal review at T125338 and we were suggested to ask for a formal review here.

Working test environment

Don't know.

Post-deployment

Don't know.

Thank you.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 19 2017, 12:05 PM
MarcoAurelio added a subscriber: Skizzerz.

Hi @Bawolff - can this task be put on your team's radar? Thanks!

Hi @Bawolff - can this task be put on your team's radar? Thanks!

Hi - @dpatrick is the one who officially decides the review schedule. I'll mention to him at our next meeting that this needs scheduling.

Thanks a bunch Sir. Regards :-)

Reedy updated the task description. (Show Details)Feb 10 2017, 1:31 AM

Review of 094b110932df of StopForumSpam.

Overall, this is fairly good security wise. The only security issue I see is that api requests should use POST and HTTPS. In the context of deploying to Wikimedia, unless you get permission from legal, the following configuration parameters (or something like this) must be set:

$wgSFSEnableConfidenceVariable = false;
$wgExtensionFunctions[] = function() {
	global $wgAvailableRights, $wgGroupPermissions;
	unset( $wgGroupPermissions['sysop']['stopforumspam'] );
	$wgAvailableRights = array_filter( $wgAvailableRights, function ( $right ) { $right !== 'stopforumspam'; } );
};

in order to disable features that could threaten the privacy of our users.

From a non-security prespective, I find the scheme for storing the blacklist in memcached to be a bit convoluted. I think it would be much simpler and probably more efficient to convert the blacklist to a php array saved to a file and import it via require.

As a future feature enhancement, it might be cool to make flagged ips that aren't flagged enough to be blacklisted, instead trigger captchas much more aggressively.

Security (Normal)

  • "StopForumSpam.body.php" line 80, "StopForumSpam.body.php" line 172 - This needs to be POST and https.

Security (Minor)

  • "StopForumSpam.body.php" line 217 - I would prefer we stop using md5 even in "safe" cases and instead use truncated sha256. That way one doesn't have to spend time checking that md5 is safe in the specific context.

Other non-security

  • Maybe onSpecialPageBeforeFormDisplay shoudl be merged into onSpecialBlockModifyFormFields?
  • Not sure what the output buffering is about in BlacklistUpdate.php (Why would there be any output? And if there was, why would it matter more than in any other place in MW where no output is supposed to be produced)
  • "StopForumSpam.body.php" line 126 - Should probably use rc_user_text as the where condition, not rc_user as there is no index on rc_user.
  • "StopForumSpam.body.php" line 108 - I think it might be more logical to default to checkuser, if both checkuser and rc_ip are available. Or maybe it should do both and combine results?
  • "StopForumSpam.body.php" line 208 - should that be $params['email'] = $user->getEmail(); instead of $params['username'] = $user->getEmail();
  • "StopForumSpam.body.php" line 228 - Daily limit should probably be a class constant. Also, shouldn't that be 100,000 not 20,000?
  • "BlacklistUpdate.php" line 62 - $ip === array( null ) doesn't seem like the right test. Under what circumstances would that happen?
  • "StopForumSpam.body.php" line 49 - Why don't we want to submit anons?
  • The bit twiddling for the bucket&offset code could use some explanatory comments
  • "StopForumSpam.body.php" line 186 - is averaging really the best here? Naively I would assume that taking the max of the set would be better.
  • "StopForumSpam.body.php" line 305 - This function will not work with IPv6.
  • Personally, the code to insert all of this in memcached, and also make it work via job queue seems a bit convoluted. Since for this to work anyways you have to fetch the blacklist file via a cron script, I wonder if it'd be simpler to as part of the fetch process just convert it to a php array, and load that (If the list really is too huge and there are concerns about loading the entire thing at once, the file could be converted to a cdb file instead). This seems like the type of thing that is similar to say interwikis in terms of what type of caching is appropriate.
  • I'm surprised that updateBlacklist.php doesn't fetch the blacklist file.
  • I'm not sure what SFS's confidence metric involves, but it'd be nice if that could be done via the downloaded data check, instead of via an api request. Google suggests its being calculated based on the last seen date and the number of hits using Wilson's score interval (Whatever that is) so in theory it should be possible [Although there is still privacy implications even if the check is done offline].

Assigning to @Legoktm so he can address the issues following the security review. Thank you so much for the review!

I was thinking that, maybe, we could do a deploy of this extension to the Beta Cluster, where we're having a continuous spamming problem, once @Bawolff issues in the security review are addressed, so we can test it there and prepare for deployment on production wikis? According to https://www.mediawiki.org/wiki/User:Legoktm/StopForumSpam_effectiveness, more than 20% of IPs blacklisted on SFS are also blocked on Wikimedia wikis so it seems like a good resource in anti spamming activities.

updateBlacklist.php should maybe added to an automatic cron job each 12/24 hours so our data is always updated.

I was thinking that, maybe, we could do a deploy of this extension to the Beta Cluster, where we're having a continuous spamming problem, once @Bawolff issues in the security review are addressed, so we can test it there and prepare for deployment on production wikis? According to https://www.mediawiki.org/wiki/User:Legoktm/StopForumSpam_effectiveness, more than 20% of IPs blacklisted on SFS are also blocked on Wikimedia wikis so it seems like a good resource in anti spamming activities.

The issues mentioned here are relatively minor. Well they should be fixed before deploying to production, i do not feel they neccesarily block deploying to beta (provided of course that the privacy settings i mentioned are used)

@Bawolff The extension has changed a bit since the security review. Some files you listed do not exist anymore. I think I should request another security review, thoughts?

@Bawolff The extension has changed a bit since the security review. Some files you listed do not exist anymore. I think I should request another security review, thoughts?

I don't think its a requirement before getting it on beta. I think it would be good to do another one before deploying to production (and after the issues mentioned in my previous review are fixed)

MarcoAurelio triaged this task as Lowest priority.Aug 20 2018, 7:43 PM

Per the lack of activity.

MarcoAurelio closed this task as Resolved.Sep 4 2018, 2:39 PM

Closing as resolved as the security review did happened. However, the issues mentioned were not addressed (or at least were not notified here). The extension has also been modified since this review happened and some files mentioned here no longer exists. It is my opinion that before deploying to the Beta cluster --and certainly to production-- another security review should be scheduled. Thank you.

Uhh, I had drafted a comment and never pressed "Submit". Better late than never!

Review of 094b110932df of StopForumSpam.
Overall, this is fairly good security wise. The only security issue I see is that api requests should use POST and HTTPS. In the context of deploying to Wikimedia, unless you get permission from legal, the following configuration parameters (or something like this) must be set:

$wgSFSEnableConfidenceVariable = false;
$wgExtensionFunctions[] = function() {
	global $wgAvailableRights, $wgGroupPermissions;
	unset( $wgGroupPermissions['sysop']['stopforumspam'] );
	$wgAvailableRights = array_filter( $wgAvailableRights, function ( $right ) { $right !== 'stopforumspam'; } );
};

in order to disable features that could threaten the privacy of our users.

I revamped the extension and removed all of the data submission features and API requests.

From a non-security prespective, I find the scheme for storing the blacklist in memcached to be a bit convoluted. I think it would be much simpler and probably more efficient to convert the blacklist to a php array saved to a file and import it via require.

The original goal of this extension was to work for every MediaWiki instance, regardless of access to a job queue, or low memory limits. We might want to rework this to make it more straightforward for Wikimedia usage.

As a future feature enhancement, it might be cool to make flagged ips that aren't flagged enough to be blacklisted, instead trigger captchas much more aggressively.

Security (Normal)

  • "StopForumSpam.body.php" line 80, "StopForumSpam.body.php" line 172 - This needs to be POST and https.

Code removed.

Security (Minor)

  • "StopForumSpam.body.php" line 217 - I would prefer we stop using md5 even in "safe" cases and instead use truncated sha256. That way one doesn't have to spend time checking that md5 is safe in the specific context.

Code removed.

Other non-security

  • Maybe onSpecialPageBeforeFormDisplay shoudl be merged into onSpecialBlockModifyFormFields?
  • "StopForumSpam.body.php" line 126 - Should probably use rc_user_text as the where condition, not rc_user as there is no index on rc_user.
  • "StopForumSpam.body.php" line 108 - I think it might be more logical to default to checkuser, if both checkuser and rc_ip are available. Or maybe it should do both and combine results?
  • "StopForumSpam.body.php" line 208 - should that be $params['email'] = $user->getEmail(); instead of $params['username'] = $user->getEmail();
  • "StopForumSpam.body.php" line 228 - Daily limit should probably be a class constant. Also, shouldn't that be 100,000 not 20,000?
  • "StopForumSpam.body.php" line 49 - Why don't we want to submit anons?
  • "StopForumSpam.body.php" line 186 - is averaging really the best here? Naively I would assume that taking the max of the set would be better.

Code removed.

  • Not sure what the output buffering is about in BlacklistUpdate.php (Why would there be any output? And if there was, why would it matter more than in any other place in MW where no output is supposed to be produced)

I...don't remember what that was for.

  • "BlacklistUpdate.php" line 62 - $ip === array( null ) doesn't seem like the right test. Under what circumstances would that happen?

Lines where the CSV couldn't be parsed I think.

  • The bit twiddling for the bucket&offset code could use some explanatory comments

Alright. But I think we're probably going to rewrite or remove that in favor of IPSet.

  • "StopForumSpam.body.php" line 305 - This function will not work with IPv6.

Yep, this is known: T173399: Support StopForumSpam's IPv6 blocklists. SFS IPv6 lists are pretty small though, the 7 day one only has 68 IPs in it.

  • Personally, the code to insert all of this in memcached, and also make it work via job queue seems a bit convoluted. Since for this to work anyways you have to fetch the blacklist file via a cron script, I wonder if it'd be simpler to as part of the fetch process just convert it to a php array, and load that (If the list really is too huge and there are concerns about loading the entire thing at once, the file could be converted to a cdb file instead). This seems like the type of thing that is similar to say interwikis in terms of what type of caching is appropriate.

The list is 17k IPs. Using an array and in_array checks would be too slow. CDB or maybe serialized IPSet might work.

  • I'm surprised that updateBlacklist.php doesn't fetch the blacklist file.

I tried that in https://gerrit.wikimedia.org/r/#/c/143572/ but never followed-up on it.

  • I'm not sure what SFS's confidence metric involves, but it'd be nice if that could be done via the downloaded data check, instead of via an api request. Google suggests its being calculated based on the last seen date and the number of hits using Wilson's score interval (Whatever that is) so in theory it should be possible [Although there is still privacy implications even if the check is done offline].

I got rid of that entirely, and replaced it with an AbuseFilter variable that is simply whether the IP is blacklisted or not.