Page MenuHomePhabricator

Provision and test tools-mailrelay-02
Closed, ResolvedPublic


  • tools-mailrelay-01 was eaten by the nova overlords
  • tools-mailrelay-02 will be a Precise host, to replace tools-mail which is suffering from bloat (T97437)
  • tools-mailrelay-03 will be a Trusty host so we can update the puppet manifests for trusty
  • After -03 is working correctly, we can switch over there
  • tools-mail should not be decommissioned until we clear the queue of stale emails
  • After -03 is working correctly, we can switch there, and keep -02 as hot-spare (T96967)
  • Afterwards, we can build a new Trusty hot spare -04

New host:

Provisioning checklist etherpad:

Event Timeline

valhallasw raised the priority of this task from to Medium.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw added subscribers: scfc, coren, valhallasw and 2 others.

Need to think of an actual test plan. This should include the following mail sources:

  • receiving mail from an external host -- SMTP in and check delivery
  • delivering local mail
  • delivering mail from other tools hosts -- not sure how to test? provision single instance with mailrelay-XX as mail relay?

and the following targets:

  • root
  • (should work/not work?)
  • others?

(If you find something usable for a "test suite", please report here. In a few weeks, I need to set up my own personal mail server, and I would really like to have some script(s) that I can run to be sure that the server is not a spam relay, that all mail addresses work that should be working, etc. Ideally, not something that tests exim4 on the command line, but that connects to the server via SMTP as a normal client/server would.)

tools-mail and tools-mailrelay-01 are now fighting over who's the current relay (/data/project/.system/store/mail-relay). I disabled puppet on tools-mailrelay-01 for now. We should probably merge before attempting to switch over.

This should not be an issue for testing though, as exim has been deployed, so that machinery should be operational.

Actually, I think that fighting was a good server test on itself ;-) A few excerpts from the log file:

2015-04-29 19:33:08 1YnXjI-0000C1-5i "root@tools-mailrelay-01.eqiad.wmflabs" from env-from rewritten as "" by rule 1

that doesn't seem right, but could just be an issue with the initial configuration (exim being started before the new config file is in place?).

2015-04-29 20:20:15 1YnYSt-0000ja-9B <= H=tools-submit.eqiad.wmflabs [] U=Debian-exim P=esmtp S=1116 id=E1YnYSq-0008LR-AY@tools-submit.eqiad.wmflabs
2015-04-29 20:20:15 1YnYSt-0000ja-9B [2607:f8b0:400d:c08::1a] Network is unreachable
2015-04-29 20:20:15 1YnYSt-0000ja-9B => R=dnslookup T=remote_smtp [] X=TLS1.0:RSA_ARCFOUR_SHA1:16
2015-04-29 20:20:15 1YnYSt-0000ja-9B Completed

delivery is working, but it's trying ipv6 first. That's not really an issue, but it could be cleaner.

valhallasw changed Security from None to Software security bug.May 11 2015, 6:44 PM
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptMay 11 2015, 6:44 PM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: acl*security. · View Herald Transcript

Executed the following test plan:

receiving mail from an external host -- SMTP in and check delivery

  • delivering local mail to valhallasw
  • delivering local mail to root
  • delivering local mail to tools.admin
  • delivering local mail to
  • delivering remote mail to failed
  • NOT delivering remote mail to failed (tested by smtp'ing from bastion, though, so maybe that counts as 'internal'? tools-mail does the same)

Yes, the bastions are "inside" and thus allowed to relay.

valhallasw changed Security from Software security bug to None.

Assigned /

  • NOT delivering remote mail to
  • delivering remote mail to still failing

Time to dive into exim logs...

Legoktm changed the visibility from "Custom Policy" to "Public (No Login Required)".May 11 2015, 6:59 PM
Legoktm changed the edit policy from "Custom Policy" to "All Users".
  • delivering remote mail to still failing
2015-05-11 18:56:38 1YrssK-00070S-GK ** <>
R=dnslookup T=remote_smtp: SMTP error from remote mail server after end of data:
550-5.7.1 [      11] Our system has detected that this message is
550-5.7.1 not RFC 2822 compliant. To reduce the amount of spam sent to Gmail,
550-5.7.1 this message has been blocked. Please review
550 5.7.1 RFC 2822 specifications for more information. w140si7312939qha.15 - gsmtp

so exim did it's best, but my telnetting skills were no match for gmail. So:

  • delivering remote mail to

I really should write this into a test script, as it should be relatively easy to automate. Following steps are:

  • Add MX record for
  • Stop puppet on tools-mail
  • Start puppet on tools-mailrelay-01

... wait for a week or so

  • Check exim queue on tools-mail, and empty/resolve manually
  • Shut down exim on tools-mail
  • Remove MX record *afterwards* (so that we first stop sending mail, and only then we remove the MX record, which is coupled to the SPF record)

NB: As shown by your test mail above, changing IPs will make other hosts reject mails as spam! (Exclamation marks used intentionally :-).) The current mail relay puppetry will announce itself to the world as, and there is an RDNS record that points the IP address back to that.

So only changing the MX record is not enough! In addition, DNS records are cached, so the switching must work under the assumption that all changes happen at random times!

The "easy" solution is to reassign the public IP associated with The more laid back approach is:

  • Add the individual hostnames of the mail relays to DNS including RDNS (something like
  • Make mail relays (apart from the current one) announce themselves with their individual hostnames.
  • Add those as MX records.
  • Remove the current one from the MX record.

The mail relay, whatever its name, should most certainly not be lying about it. Both names will be MXes, and the spf is set to mx -all so that'll just work. If nobody lies. :-)

I've braindumped a checklist at assuming the new mail server manifest gets merged.

One thing I'm not sure of is how to test the server without actually putting it into service; I wouldn't want 3rd parties to send mail to it yet, but without MX record, mail could be marked as spam, if I understand correctly?

valhallasw renamed this task from Provision and test tools-mailrelay-01 to Provision and test tools-mailrelay-02.Jul 2 2015, 5:58 PM
valhallasw updated the task description. (Show Details)

Change 222358 had a related patch set uploaded (by Merlijn van Deen):
[tools] New host: tools-mailrelay-02

Change 222362 had a related patch set uploaded (by Merlijn van Deen):
Add PTR record for

Checklist for the new host:

  1. Instance creation
  2. Instance configuration, pt 1 - restricted_to=tools.admin
  3. External connectivity
  4. Set up Puppet classes: assign role role::labs::tools::mailrelay
  5. Track puppet application
  6. Add MX record (not in git, but LDAP? check with Coren)
  7. Run tests

Change 222362 merged by coren:
Add PTR record for

Change 222358 merged by coren:
[tools] New host: tools-mailrelay-02

coren claimed this task.

Made moot by the decision to skip a precise host entirely.