Page MenuHomePhabricator

Provision and test tools-mailrelay-02
Closed, ResolvedPublic

Description

  • tools-mailrelay-01 was eaten by the nova overlords
  • tools-mailrelay-02 will be a Precise host, to replace tools-mail which is suffering from bloat (T97437)
  • tools-mailrelay-03 will be a Trusty host so we can update the puppet manifests for trusty
  • After -03 is working correctly, we can switch over there
  • tools-mail should not be decommissioned until we clear the queue of stale emails
  • After -03 is working correctly, we can switch there, and keep -02 as hot-spare (T96967)
  • Afterwards, we can build a new Trusty hot spare -04

New host: https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000bca.eqiad.wmflabs

Provisioning checklist etherpad: https://etherpad.wikimedia.org/p/T97574

Event Timeline

valhallasw raised the priority of this task from to Normal.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw added subscribers: scfc, coren, valhallasw and 2 others.

Need to think of an actual test plan. This should include the following mail sources:

  • receiving mail from an external host -- SMTP in and check delivery
  • delivering local mail
  • delivering mail from other tools hosts -- not sure how to test? provision single instance with mailrelay-XX as mail relay?

and the following targets:

  • root
  • user@tools.wmflabs.org (should work/not work?)
  • tools.xx@tools.wmflabs.org
  • others?
scfc added a comment.Apr 29 2015, 8:51 PM

(If you find something usable for a "test suite", please report here. In a few weeks, I need to set up my own personal mail server, and I would really like to have some script(s) that I can run to be sure that the server is not a spam relay, that all mail addresses work that should be working, etc. Ideally, not something that tests exim4 on the command line, but that connects to the server via SMTP as a normal client/server would.)

tools-mail and tools-mailrelay-01 are now fighting over who's the current relay (/data/project/.system/store/mail-relay). I disabled puppet on tools-mailrelay-01 for now. We should probably merge https://gerrit.wikimedia.org/r/#/c/205914/ before attempting to switch over.

This should not be an issue for testing though, as exim has been deployed, so that machinery should be operational.

Actually, I think that fighting was a good server test on itself ;-) A few excerpts from the log file:

2015-04-29 19:33:08 1YnXjI-0000C1-5i "root@tools-mailrelay-01.eqiad.wmflabs" from env-from rewritten as "root@wikimedia.org" by rule 1

that doesn't seem right, but could just be an issue with the initial configuration (exim being started before the new config file is in place?).

2015-04-29 20:20:15 1YnYSt-0000ja-9B <= tools.XXXXXXX@tools.wmflabs.org H=tools-submit.eqiad.wmflabs [10.68.17.1] U=Debian-exim P=esmtp S=1116 id=E1YnYSq-0008LR-AY@tools-submit.eqiad.wmflabs
2015-04-29 20:20:15 1YnYSt-0000ja-9B gmail-smtp-in.l.google.com [2607:f8b0:400d:c08::1a] Network is unreachable
2015-04-29 20:20:15 1YnYSt-0000ja-9B => XXXXXXXXXXXXXXX@gmail.com R=dnslookup T=remote_smtp H=gmail-smtp-in.l.google.com [74.125.22.27] X=TLS1.0:RSA_ARCFOUR_SHA1:16
2015-04-29 20:20:15 1YnYSt-0000ja-9B Completed

delivery is working, but it's trying ipv6 first. That's not really an issue, but it could be cleaner.

valhallasw updated the task description. (Show Details)Apr 29 2015, 9:11 PM
valhallasw set Security to None.
valhallasw moved this task from Triage to In Progress on the Toolforge board.May 10 2015, 8:44 PM
valhallasw changed Security from None to Software security bug.May 11 2015, 6:44 PM
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptMay 11 2015, 6:44 PM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: Security. · View Herald Transcript
valhallasw added a comment.EditedMay 11 2015, 6:45 PM

Executed the following test plan:

receiving mail from an external host -- SMTP in and check delivery

  • delivering local mail to valhallasw
  • delivering local mail to root
  • delivering local mail to tools.admin
  • delivering local mail to valhallasw@arctus.nl
  • delivering remote mail to valhallasw@tools.wmflabs.org failed
  • NOT delivering remote mail to valhallasw@arctus.nl failed (tested by smtp'ing from bastion, though, so maybe that counts as 'internal'? tools-mail does the same)
coren added a comment.May 11 2015, 6:52 PM

Yes, the bastions are "inside" and thus allowed to relay.

valhallasw changed Security from Software security bug to None.

Assigned 208.80.155.188 / mailrelay-01.tools.wmflabs.org.

  • NOT delivering remote mail to valhallasw@arctus.nl
  • delivering remote mail to tools.admin@tools.wmflabs.org still failing

Time to dive into exim logs...

Legoktm changed the visibility from "Custom Policy" to "Public (No Login Required)".May 11 2015, 6:59 PM
Legoktm changed the edit policy from "Custom Policy" to "All Users".
valhallasw added a comment.EditedMay 11 2015, 7:13 PM
  • delivering remote mail to tools.admin@tools.wmflabs.org still failing
2015-05-11 18:56:38 1YrssK-00070S-GK ** valhallasw@arctus.nl <tools.admin@tools.wmflabs.org>
R=dnslookup T=remote_smtp: SMTP error from remote mail server after end of data:
host ASPMX.L.GOOGLE.COM [74.125.22.26]:
550-5.7.1 [208.80.155.188      11] Our system has detected that this message is
550-5.7.1 not RFC 2822 compliant. To reduce the amount of spam sent to Gmail,
550-5.7.1 this message has been blocked. Please review
550 5.7.1 RFC 2822 specifications for more information. w140si7312939qha.15 - gsmtp

so exim did it's best, but my telnetting skills were no match for gmail. So:

  • delivering remote mail to tools.admin@tools.wmflabs.org

I really should write this into a test script, as it should be relatively easy to automate. Following steps are:

  • Add MX record for mailrelay-01.tools.wmflabs.org
  • Stop puppet on tools-mail
  • Start puppet on tools-mailrelay-01

... wait for a week or so

  • Check exim queue on tools-mail, and empty/resolve manually
  • Shut down exim on tools-mail
  • Remove MX record *afterwards* (so that we first stop sending mail, and only then we remove the MX record, which is coupled to the SPF record)
scfc added a comment.May 11 2015, 7:46 PM

NB: As shown by your test mail above, changing IPs will make other hosts reject mails as spam! (Exclamation marks used intentionally :-).) The current mail relay puppetry will announce itself to the world as mail.tools.wmflabs.org, and there is an RDNS record that points the IP address back to that.

So only changing the MX record is not enough! In addition, DNS records are cached, so the switching must work under the assumption that all changes happen at random times!

The "easy" solution is to reassign the public IP associated with mail.tools.wmflabs.org. The more laid back approach is:

  • Add the individual hostnames of the mail relays to DNS including RDNS (something like mail-relay-01.tools.wmflabs.org).
  • Make mail relays (apart from the current one) announce themselves with their individual hostnames.
  • Add those as MX records.
  • Remove the current one from the MX record.
coren added a comment.May 11 2015, 8:19 PM

The mail relay, whatever its name, should most certainly not be lying about it. Both names will be MXes, and the spf is set to mx -all so that'll just work. If nobody lies. :-)

The relevant (r)DNS change for mail.tools.wmflabs.org is https://gerrit.wikimedia.org/r/#/c/121416/

valhallasw updated the task description. (Show Details)Jun 30 2015, 8:04 PM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 30 2015, 8:04 PM

I've braindumped a checklist at https://etherpad.wikimedia.org/p/T97574 assuming the new mail server manifest gets merged.

One thing I'm not sure of is how to test the server without actually putting it into service; I wouldn't want 3rd parties to send mail to it yet, but without MX record, mail could be marked as spam, if I understand correctly?

valhallasw renamed this task from Provision and test tools-mailrelay-01 to Provision and test tools-mailrelay-02.Jul 2 2015, 5:58 PM
valhallasw updated the task description. (Show Details)

Change 222358 had a related patch set uploaded (by Merlijn van Deen):
[tools] New host: tools-mailrelay-02

https://gerrit.wikimedia.org/r/222358

Change 222362 had a related patch set uploaded (by Merlijn van Deen):
Add PTR record for mailrelay-02.tools.wmflabs.org

https://gerrit.wikimedia.org/r/222362

valhallasw added a comment.EditedJul 2 2015, 6:15 PM

Checklist for the new host:

  1. Instance creation
  2. Instance configuration, pt 1 - restricted_to=tools.admin
  3. External connectivity
  4. Set up Puppet classes: assign role role::labs::tools::mailrelay
  5. Track puppet application
  6. Add MX record (not in git, but LDAP? check with Coren)
  7. Run tests

Change 222362 merged by coren:
Add PTR record for mailrelay-02.tools.wmflabs.org

https://gerrit.wikimedia.org/r/222362

Change 222358 merged by coren:
[tools] New host: tools-mailrelay-02

https://gerrit.wikimedia.org/r/222358

coren closed this task as Resolved.Dec 9 2015, 2:50 PM
coren claimed this task.

Made moot by the decision to skip a precise host entirely.