Page MenuHomePhabricator

Outgoing mail to wikimedia.org not working on new labs instance
Closed, ResolvedPublic

Description

I created a new lab instance (jobs.security-tools.eqiad.wmflabs) to run some cron jobs. I'd like to be able to receive e-mail when those jobs fail. However, outgoing mail appears broken. When I run

$ echo -n "Test" | mail -s Test dpatrick@wikimedia.org

The message fails when an error in /var/log/exim4/mainlog:

2016-06-06 22:10:18 1bA2iw-0008Qs-CW "runner@jobs.security-tools.eqiad.wmflabs" from env-from rewritten as "root@wmflabs.org" by rule 1
2016-06-06 22:10:18 1bA2iw-0008Qs-CW <= root@wmflabs.org U=runner P=local S=562
2016-06-06 22:10:18 1bA2iw-0008Qs-CW mx1001.wikimedia.org [2620:0:861:3:208:80:154:76] Network is unreachable
2016-06-06 22:10:18 1bA2iw-0008Qs-CW ** dpatrick@wikimedia.org R=smart_route T=remote_smtp: SMTP error from remote mail server after RCPT TO:<dpatrick@wikimedia.org>: host mx1001.wikimedia.org [208.80.154.76]: 550-Verification failed for <root@wmflabs.org>\n550-Cannot route to remote domain wmflabs.org\n550 Sender verify failed
2016-06-06 22:10:18 1bA2iw-0008Qx-E8 <= <> R=1bA2iw-0008Qs-CW U=Debian-exim P=local S=1653

Event Timeline

yuvipanda renamed this task from Outgoing mail not working on new labs instance to Outgoing mail to wikimedia.org not working on new labs instance.Jun 6 2016, 10:42 PM

It seems to work fine for non wikimedia.org mail addresses:

2016-06-06 22:26:49 1bA2yv-0000bk-RQ "root@jobs.security-tools.eqiad.wmflabs" from env-from rewritten as "root@wmflabs.org" by
 rule 1
2016-06-06 22:26:49 1bA2yv-0000bk-RQ <= root@wmflabs.org U=root P=local S=585
2016-06-06 22:26:49 1bA2yv-0000bk-RQ mx1001.wikimedia.org [2620:0:861:3:208:80:154:76] Network is unreachable
2016-06-06 22:26:49 1bA2yv-0000bk-RQ => yuvipanda@gmail.com R=smart_route T=remote_smtp S=603 H=mx1001.wikimedia.org [208.80.154.76] C="250 OK id=1bA2yv-0001qK-S1" DT=0s
2016-06-06 22:26:49 1bA2yv-0000bk-RQ Completed
mailx -r 'yuvipanda@gmail.com' yuvipanda@wikimedia.org

works well and delivers it fine.

On 0295b935034c685936f5a79c273696df6bb4521c I mentioned:

[…]
On a later step, Labs could get its own mail relays entirely; this split config would allow for this.

For mails in Labs to properly work after this change, wmflabs.org should become a mail domain (MX records + a simple alias file defining at least the root alias).
[…]

The former is T41785, open for almost 4 years now :(

The latter was never done and, as expected, Labs email is broken. Ideally we'd fix this properly, but we should probably continue the interim hacks as to not wait another e.g. 4 years to unbreak Labs emails…

In short, this behavior essentially happens because root@wmflabs.org does not exist and our (production) mailservers perform sender verification (= reject emails that originate from invalid addresses).

I just set up wmflabs.org as a valid email domain and pointed root@ and a few other aliases to the Cloud-Services team (with 5ac81c7150ee6dc0a4375b146adc1c961180513a + a few puppet-private commits). What's left here is to add a few records on the wmflabs.org, which is managed by Designate and I'm not entirely sure how to do that — Cc @Andrew.

The records that we need are:

wmflabs.org. 1H IN MX 10 mx1001.wikimedia.org.
wmflabs.org. 1H IN MX 50 mx2001.wikimedia.org.
wmflabs.org. 1H IN TXT "v=spf1 mx ?all"

Mentioned in SAL [2016-06-07T00:39:37Z] <Krenair> Created MX and SPF records directly for wmflabs.org. for https://phabricator.wikimedia.org/T137160#2359786

The records that we need are:

wmflabs.org. 1H IN MX 10 mx1001.wikimedia.org.
wmflabs.org. 1H IN MX 50 mx2001.wikimedia.org.
wmflabs.org. 1H IN TXT "v=spf1 mx ?all"

Horizon was perfectly happy with those MX records (just, as an admin of the wmflabsdotorg project, create the MX record with an empty 'name' (subdomain) field - just leaving the static '.wmflabs.org.' text)
However, because of this part of the designatedashboard code: https://github.com/openstack/designate-dashboard/blame/master/designatedashboard/dashboards/project/dns_domains/forms.py#L396-L401
... it wasn't so happy to make the TXT record for us. I ended up having to use python-designateclient (the old v1, because silver) to do it. The code ended up being something like this:

from keystoneclient.auth.identity import generic
from keystoneclient import session as keystone_session
from designateclient import v1
from designateclient.v1.records import Record

auth = generic.Password(
    auth_url="http://labcontrol1001.wikimedia.org:35357/v2.0",
    username="novaadmin",
    password=redacted,
    tenant_name='wmflabsdotorg'
)

session = keystone_session.Session(auth=auth)
client = v1.Client(session=session)
domain_id = client.domains.list()[0]['id'] # 553ef162-add7-4a5c-b115-9cabca662746
record = Record(name="wmflabs.org.", type="TXT", data="\"v=spf1 mx ?all\"", ttl=3600)
result = client.records.create(domain_id, record)

(I did it once in the python console, so never actually tested this whole block of code together)

This is working now. Thank you!