Page MenuHomePhabricator

Replace primary mail relays (polonium/lead)
Closed, ResolvedPublic

Description

Our current e-mail relays, polonium and lead, need to be replaced:

  • They are both in eqiad, something that was always meant to be temporary (at the time they were provisioned codfw was not in production)
  • They need to be convered into jessie
  • They need to be moved into VMs
  • (Optional) They need to get cluster-based hostnames as we'll need to issue certificates for them

The systems are well-puppetized, so It should be relatively simple. The only tricky part is handling the migration, as polonium/lead's IP addresses are hardcoded e.g. in the Google for Apps panel OIT is handling.

Event Timeline

faidon claimed this task.
faidon raised the priority of this task from to High.
faidon updated the task description. (Show Details)
faidon added projects: acl*sre-team, Mail.
faidon subscribed.

Change 239784 had a related patch set uploaded (by Faidon Liambotis):
Add mx1001/mx2001 as role mail::mx

https://gerrit.wikimedia.org/r/239784

Change 239784 merged by Faidon Liambotis:
Add mx1001/mx2001 as role mail::mx

https://gerrit.wikimedia.org/r/239784

The new hosts, mx1001/mx2001 are up and running. I've already notified WMF's Office IT team to update Google Apps with the new IPs (#8564 on their ticketing).

Google Apps was updated by OIT. MXes for all domains except wikimedia.org and its subdomains have been switched. wiki-mail-eqiad was switched as well.

wikimedia.org and subdomains will follow next; the generic wiki-mail CNAME too, as well as a new wiki-mail-codfw. ETA is tomorrow, Sep 22nd.

All of the above are done. polonium still gets a fair share of emails (spammers don't really obey DNS TTLs); I'll be monitoring it over the next few days, find any stray email flows and switch those as well. After that, a ticket to properly decom polonium/lead will follow.

This is essentially done for a few days now. See T113962 for the decom task.