Page MenuHomePhabricator

Ban spam arriving to my tools email
Closed, ResolvedPublic

Description

Since a month or so I am receiving a moderate number of spam emails, everyday, to my maurelio[at]tools.wmflabs.org address which I use for gerrit development. Most of them use Korean characters.

  • Is there any way that I can set some config file in my tools root directory to avoid them?
  • Is it possible to ban some addresses from the server directly?
  • Maybe implement some X-Spam-Score server-side and /dev/null those?

This also makes me wonder if gerrit.wikimedia; the only place where this address is publicly exposed, should have better robots.txt protection to avoid crawlers from grabbing all those email addresses and spam developers. Maybe email addresses should not be fully displayed, or not be displayed at all (but that's for another task I guess).

Thank you.

Details of a sample spam email can be found at P7477 & P7478 which I'll selectively made visible to people able to fix this, as I'm not sure how many --if any-- private information can be there.

Event Timeline

aborrero subscribed.

I'm not sure, I will ask my team and get back to you ASAP.

bd808 lowered the priority of this task from High to Medium.Aug 28 2018, 7:32 PM
bd808 subscribed.

We do not currently have SpamAssassin or a similar spam scoring tool available to process emails sent to *@tools.wmflabs.org addresses prior to .forward bounce forwarding. It would be possible to setup and maintain such a service, but doing so requires long term maintenance of the service that currently seems costly. I would personally suggest routing *@tools.wmflabs.org through gmail or another mail provider with good spam filtering functionality as a workaround for the lack of pre-processing on the Cloud Services side.

[...] I would personally suggest routing *@tools.wmflabs.org through gmail or another mail provider with good spam filtering functionality as a workaround for the lack of pre-processing on the Cloud Services side.

Thank you and yes, I already do that. But having to clear my spam folder twice a day is getting annoying, and it looks overall that spammers are abusing WMCS addresses to spam our fellow contributors, which is not okay. Could we reuse, maybe, some of the stuff they use on Puppet to filter spam on wikimedia mailing lists?

This also makes me wonder if gerrit.wikimedia; the only place where this address is publicly exposed, should have better robots.txt protection to avoid crawlers from grabbing all those email addresses and spam developers. Maybe email addresses should not be fully displayed, or not be displayed at all (but that's for another task I guess).

That wouldn't provide much protection since your email is in the git commit history, which spammers are definitely scraping.

Bstorm subscribed.

Cleared my spam and looking around in case there is any patterns or anything that can be done simply enough. There aren't many things actually denied by the look of things in our set up.

Related: T170601

This is also coming up in sysadmin forums all over the place. If these are Chinese as I am getting (from qq.com) and not actually Korean, we could do similar to the solution above and block qq.com. TenCent is kind of a big deal in China when it comes to internet services, but I'm not sure it is typical for individuals to reach out with qq.com email addresses. Some have reported that the spam continues after you block qq from Yahoo and friends, but that doesn't mean we won't benefit from such a block.

We may not have spamassassin, but we do have a means of blocking things. :)

On further investigation, there's more qq.com emails that would be valid than would not be. The spam seems to come from entirely numeric email addresses, so maybe that can be filtered? I'm concerned that hanzi-based emails might translate into numerics.

On further investigation, there's more qq.com emails that would be valid than would not be. The spam seems to come from entirely numeric email addresses, so maybe that can be filtered? I'm concerned that hanzi-based emails might translate into numerics.

It's very unfortunate that it is extremely common for a qq.com email address to be entirely numeric (they correspond to Tencent QQ IDs).

It's very unfortunate that it is extremely common for a qq.com email address to be entirely numeric (they correspond to Tencent QQ IDs).

Well, that idea won't work too well, then. I'm not seeing a good way to differentiate valid and invalid emails without some kind of spam scoring framework or removing some services around the email forwarding.

Jumping in late with some ideas. For specific batch of spam I received today, they all failed the SPF test for qq.com. Enabling SPF checks might be a cheap solution. In the qq.com case, they would increase the load on the DNS server by 7 requests/email. It might not make much of a difference but just to keep that in mind.

I also took a sample of originating IP address (all from ChinaNet) and checked them against SpamHaus RBL and all were matched in the PBL and XBL lists so that might be another check we could add that doesn't require SpamAssassin. However, we may have to keep whitelists and point people to SpamHaus FAQ on how to unblock themselves, in case of false positives.

FWIW the spam I'm receiving comes from @nate.com mainly. Thanks.

The spam I'm receiving is originating externally, getting accepted by tools-mail.tools.eqiad.wmflabs (as it's the MX for tools.wmflabs.org), being forwarded to mx1001.wikimedia.org and getting delivered to my inbox. Apparently, mx*.wikimedia.org are putting a lot of trust in email originated from labs networks:

modules/role/templates/exim/exim4.conf.mx.erb:

# this explicitly allows all Wikimedia networks, including Labs, as we are the relays for them as well
hostlist relay_from_hosts = <; @[] ; 127.0.0.1 ; ::1 ; <%= scope.lookupvar('network::constants::all_networks').join(" ; ") %>

This seems to shortcircuit a lot of checks at the edge mail relays, making tools-mail.tools.eqiad.wmflabs a safe entry point for spam. Labs accepted it and the production MXs trust Labs.

My understanding so far is that the production network shouldn't fully trust Labs services like that. So I'd like to propose we change "all_networks" to "production_networks" and let the Wikimedia MXs properly treat Labs email as external. I'm not ignoring the possible breakage coming from enforcing spam filters on apps/hosts that were getting safe passage before but it seems like the benefits outweigh the costs (e.g., spam-looking emails from tools could get blocked, mail generation/delivery rate exceeding thresholds, etc). This should probably give us more visibility about the kind of checks we should have in the Labs MX.

I don't have a full understanding of all the implications. Would this be too radical? I understand it doesn't solve all the cases where spam is getting delivered to/from tools-mail but it seems like a start.

Change 462472 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] tools mail: add spamhaus rbl check

https://gerrit.wikimedia.org/r/462472

I also took a sample of originating IP address (all from ChinaNet) and checked them against SpamHaus RBL and all were matched in the PBL and XBL lists so that might be another check we could add that doesn't require SpamAssassin. However, we may have to keep whitelists and point people to SpamHaus FAQ on how to unblock themselves, in case of false positives.

Adding an RBL check to the tools MX should be an effective near-term fix. We do perform these checks already on the production MX, but is ineffective in this case as the messages have been accepted and relayed from the unlisted (in rbl) tools mail host.

Above is a patch to enable a spamhaus RBL check in warning mode. This would log RBL hits, but not drop mail. A follow up change would be needed to change the action from warn to drop.

I don't have a full understanding of all the implications. Would this be too radical? I understand it doesn't solve all the cases where spam is getting delivered to/from tools-mail but it seems like a start.

This is likely getting made obsolete soon with T41785 but I would be careful to ensure it only treats labs mail as untrusted, rather than blocking it altogether.

Thanks @herron and @GTirloni! The spamhaus list certainly can't hurt. I don't think we get so much email as to run afoul of their policies?

Thanks @herron and @GTirloni! The spamhaus list certainly can't hurt. I don't think we get so much email as to run afoul of their policies?

Here are guidelines and volume thresholds from https://www.spamhaus.org/organization/dnsblusage/

Use of the Spamhaus DNSBLs via DNS queries to our public DNSBL servers is free of charge if you meet all three of the following criteria:

  1. Your use of the Spamhaus DNSBLs is non-commercial*, and
  2. Your email traffic is less than 100,000 SMTP connections per day, and
  3. Your DNSBL query volume is less than 300,000 queries per day.

Change 462472 merged by Bstorm:
[operations/puppet@production] tools mail: add spamhaus rbl check

https://gerrit.wikimedia.org/r/462472

Noticed on deploying this that the mail queue was clogged up with frozen messages rejected from qq.com servers. I cleaned them with:
exim -bpu | grep '*** frozen ***' | awk '{print $3}' | xargs -i exim -Mrm {}

We are now generating additional frozen messages due to the reply this sends, lol. It's interesting.

It's not a huge load, though, to be clear.

We are now generating additional frozen messages due to the reply this sends, lol. It's interesting.

That's odd, the warn should only write to the exim mainlog and not generate any mail at all.

Change 463144 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] tools mail: write RBL check warning to file

https://gerrit.wikimedia.org/r/463144

I couldn't find any warning messages in the log files. It seems warn message will actually return the message to the SMTP client while log_message will record it to syslog.

Change 463144 merged by Bstorm:
[operations/puppet@production] tools mail: write RBL check warning to file

https://gerrit.wikimedia.org/r/463144

Change 463611 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] tools-mail - Add strict rules against spam

https://gerrit.wikimedia.org/r/463611

Change 463611 merged by GTirloni:
[operations/puppet@production] tools-mail - Add strict rules against spam

https://gerrit.wikimedia.org/r/463611

The amount of spam seems to have reduced significantly in the last couple of days.

Emails blocked:

2457 Blocked by DNSBL 
  10 Unrouteable address  
   7 Relay not permitted 
   3 Restricted characters in address
   2 Sender verify failed

Sender domain blocked by DNSBL:

2270 qq.com
  35 nate.com
  26 gmail.com
  16 confrontcrow.us
  13 barzx.com
  13 163.com
   8 fenestrate.likestudymarketing.world
   7 comparecloud.us
   6 truncation.thantravelmarketing.world
   6 kebabant6ask.accountant
   5 congratulationscongratulations.download
   4 unconditional.afterknowmarketing.world
   4 donjon.likestudymarketing.world
   4 congratulationscongratulations.stream
   4 126.com
   3 domain.com
   2 teniafuge.08laughja.us
   2 millerite.ja18javascriptjuly.bid
   2 methoxide.08thingj.info
   2 engines.best10signs.icu
   1 zhangweb.cn
   1 yahoo.com.cn
   1 woggle.09musicgreat.info
   1 vip.qq.com
   1 sina.com
   1 ripbestjune.bid
   1 printerbestjune.bid
   1 operatebestjune.bid
   1 mail.tantaell.us
   1 mail.gusteels.xyz
   1 lexievans.com
   1 lang.best10network.icu
   1 karmaradioaustin.com
   1 huiseo.cn
   1 hrs.com
   1 gutlessbarn.info
   1 dialbar.info
   1 cornerstone-fs.com
   1 congratulationsupdate.download
   1 cantabile.09ablegreat.us
   1 best10source.us
   1 best10sight.info
   1 best10pro.info
   1 best10ola.info
   1 21cn.com
   1 163.COM

@MarcoAurelio Seem good enough? This certainly cleared up my spam folder a lot.

@MarcoAurelio Seem good enough? This certainly cleared up my spam folder a lot.

Hello @Bstorm. I continue to receive spam (5 emails today) from:

rbginfo@gubnews.ru
jungsooha@hotmail.com (To: terracodes@tools.wmflabs)
Gilbertmb@air.ocn.ne.jp (Reply-To: mbowenigilbert9@gmail.com)
pansyjli5bon@gmx.com (To: paul@bmg-led.tk)
uyungwoo84@nate.com
suyungj@outlook.kr (To: security@tools.wmflabs)

That said, I am currently away due to health issues so I'm not able to monitor this task closely for now.

That's better than my personal accounts get :) Let's close this for now and re-open if needed. I hope your health issues improve and all is well!