Page MenuHomePhabricator

Do not apply spam headers on email assessed NOT to be spam
Open, NormalPublic

Description

Polonium currently adds spamassassin headers to any message with a score >1, but only identify messages as spam with a score >4 (something close to this)

Google Support curiously suggested that our headers saying that a mail was NOT spam, might be influencing their spam detection systems.

To quote them:

"""
After checking the headers you have sent me we could determined that the Spam detection software, running on the system "polonium.wikimedia.org", is trusting this emails and therefore Google is trusting them as well, [....] you could disable the spam filtering from your server because some how they are approving those messages and interfering with Google spam filters as shown on this headers: X-Spam-Report: Spam detection software, running on the system
"polonium.wikimedia.org",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
the administrator of that system for details.
"""

Would it be possible to tweak these settings to NOT apply the header unless it's believed to be spam?

  1. Add spam headers if score >= 1 warn spam = nonexistent:true condition = ${if >{$spam_score_int}{10}{1}{0}} set acl_m0 = $spam_score ($spam_bar) set acl_m1 = $spam_report

See also: https://phabricator.wikimedia.org/T110761

Cheers,

Joel

Event Timeline

JKrauska raised the priority of this task from to Needs Triage.
JKrauska updated the task description. (Show Details)
JKrauska added a project: acl*sre-team.
JKrauska added a subscriber: JKrauska.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptSep 5 2015, 2:20 AM
jcrespo triaged this task as Normal priority.Sep 7 2015, 6:39 PM
jcrespo merged a task: Restricted Task.
jcrespo added subscribers: jcrespo, mark, faidon, Krenair.

From: T110761:

Thanks for the clarification.
This matters because there are a few cases where Google's Spam filtering is not working very well. (mostly on 'leaked' email addresses that have been used on webpages or external communication.)
Google DOES allow a regex extract on headers identifying a score to mark an email as spam.
I am proposing we tune/improve spamassassin scoring (which is already being done) to help us identify spam at ingress.

Merging both requests into 1.

Would it be possible to tweak these settings to NOT apply the header unless it's believed to be spam?

This actually hurts debugging, though, which is what the rationale is for having this in the first place. There are borderline cases (e.g. having a score of 2.5-3.5) that we would like to be able to investigate (either to figure out why it's not lower if it's ham, or why it's not higher if it's spam).

Could you explain why that header matters for Google Apps?

Google has been unhelpful in explaining /WHY/ it is a problem, but it comes up each time I open a ticket about this.

'Your very own spam analysis is saying this email isn't spam. That's why we let it through.'

The flip side of this coin to to ask you guys to do some more aggressive filtering.

I'm getting around this now by blacklisting SourceIPs from known bad senders.

Just having spammassassin do spamhaus DBL checking would help tremendously.

Would you be open to doing some more SA processing on mx1001 instead?

Yet another (fragile) solution would be to have Ops pump email to yet another server that IT maintains so that I can do more detailed SA on our own.

--Joel

Nemo_bis added a subscriber: Nemo_bis.

Maybe changing the contents for X-Spam-Report would be enough to work around this Google problem.

Replace perhaps with:

X-Antispam-Host: polonium.wikimedia.org, see T111595