Before any migration to the eqiad1 deployment, the outbout email path for of Toolforge seems to be (for @wikimedia.org recipients):
- a toolforge server (10.0.0.x) sends the email, using relay tools-mail-02.eqiad.wmflabs
- the tools relay server (10.0.0.x) recvs the email and forwards it to mx1001.wikimedia.org, i.e, prod relays
- the prod (public addr) relay handles the email correctly (follow up steps are not of interest right now)
The problem, after we move Toolforge to eqiad1 is the addressing change (from 10.0.x.x to 172.16.x.x):
- a toolforge server (172.16.x.x) sends the email, using relay tools-mail-02.eqiad,wmflabs
- the tools relay server (172.16.x.x) recvs the email and tries to forward it to mx1001.wikimedia.org, i.e, prod relays
- the prod relays doesn't allow this new addressing.
note: this description may be only valid for root email. I'm not sure what the policies are for non-root outbound emails.
note: non-wikimedia recipients will get emails delivered directly by tools-mail-02 without going through mx*.wikimedia.org
There are 2 main approaches to handle this situation:
- allow 172.16.x.x in prod relays
- use the intermediate cloud smarthosts mx-out0.wmflabs.org, to do something like:
toolforge server -> tools-mail-02 -> mx-out01.wmflabs.org -> mx1001.wikimedia.org
Any solution taken may solve T212327: Beta Cluster mailer not sending emails as well.