Page MenuHomePhabricator

fr-tech to look into supporting mail fail over for civi
Closed, ResolvedPublic3 Estimated Story Points

Event Timeline

DStrine set the point value for this task to 3.Oct 13 2020, 8:14 PM
DStrine moved this task from Triage to Current Sprint on the Fundraising-Backlog board.

@Jgreen @Dwisehaupt - I have in my head that we want to use smtp - but I realised I need to confirm that

  • it seems the phpmailer supports it fairly natively at a simple level

https://stackoverflow.com/questions/24337686/fallback-smtp-servers-with-phpmailer

Yes, we definitely want to use smtp. That page looks good and we can just list them all and it should fall through. Also looks like it will also go in order which will let us set a preference for the primary host just by the order we place them in. That will be nice.

@Dwisehaupt - are these the values we would need for config?

Screen Shot 2020-10-20 at 8.06.51 PM.png (716×1 px, 95 KB)

@Eileenmcnaughton Yes. We would just be using SMTP server and SMTP port as we don't require auth on internal connections.

@Eileenmcnaughton I did some testing and the phpmailer functionality seems reasonable. I have a couple observations.

The default connection timeout is pretty long, documentation says 300s but it seems more like 30s. Maybe it's fine do dial that back to 5-10s, but if feasible it would good to detect a fallback event and flag a down host for a minute so subsequent deliveries go quickly.

We should enable TLS. It's already supported on the mailserver side and I tested with phpmailer. Note that the cert name is for the external hostname (frmx1001.wikimedia.org) and the connection is to the internal hostname (frmx1001.frack.eqiad.wmnet) so we have to disable SSL option "verify_peer_name". See hacky attached test script.

We might as well include localhost:25 at the end of the fallback list. We'll probably eventually make the civi hosts route mail out through the frmx's too, so this would allow local queuing if both frmx's are inaccessible.

Change 635396 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Start the process of moving our mailer class to Omnimail

https://gerrit.wikimedia.org/r/635396

Thanks @Jgreen - very helpful. As a bonus I can see whether it went via SMTP or not as I applied your XMailer suggestion to the SMTP but it's not set on phpmail

Screen Shot 2020-10-21 at 4.19.24 PM.png (246×1 px, 68 KB)

Screen Shot 2020-10-21 at 4.19.08 PM.png (156×1 px, 45 KB)

I've loaded the series of patches ending in https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/635410 on staging. I then set the SMTP host to more than one server using

define('CIVICRM_SMTP_HOST',  .....)

in civicrm,.settings.php.

I've also set it up so it can be done on a per job basis - ie.

env CIVICRM_SMTP_HOST=blah drush ty

Which might allow us to warm up our servers & get more control over switching to them to prevent spam listings.

I think this is the minimum requirement covered. Optional extras are

  1. allow us to load balance between the servers rather than just failover
  2. switch email from within CiviCRM to use the SMTP class

Neither are huge tasks IMHO - but the work done so far covers the most important requirements so how much I do of the last 2 is a prioritisation / review availability question

@Eileenmcnaughton I'm glad it works and given priorities I 'm up for +2ing, but I still had some thoughts and questions on the patches. I also think there might be one typo? I'm so glad it's moving away from the modules!

Change 635396 merged by Eileen:
[wikimedia/fundraising/crm@master] Start the process of moving our mailer class to Omnimail

https://gerrit.wikimedia.org/r/635396

I did a burst of one thank you mail from live on SMTP - @Dwisehaupt @Jgreen - is it visible to you? Still on php mail now - I used env per https://phabricator.wikimedia.org/T264663#6566786

I did a burst of one thank you mail from live on SMTP - @Dwisehaupt @Jgreen - is it visible to you? Still on php mail now - I used env per https://phabricator.wikimedia.org/T264663#6566786

Nothing went out through frmx1001, although I'm not clear if you were testing smtp for local delivery or using the new mx? If you ping me next time I can watch logs while you're testing. Also the frmx mail log is available at frlog1001:/var/log/remote/frmx-mail.

Change 637893 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Move PhpMailer to Omnimail

https://gerrit.wikimedia.org/r/637893

Change 637894 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Move MailerBase to Omnimail

https://gerrit.wikimedia.org/r/637894

Change 637898 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] WIP on sending phpmailer via civi

https://gerrit.wikimedia.org/r/637898

Change 637893 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Move PhpMailer to Omnimail

https://gerrit.wikimedia.org/r/637893

Change 637894 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Move MailerBase to Omnimail

https://gerrit.wikimedia.org/r/637894

My testing command is

env CIVICRM_SMTP_HOST="tls://frmx1001.frack.eqiad.wmnet:25;tls://frmx2001.frack.codfw.wmnet:25;localhost:25"  drush cvapi thankyou.send contribution_id=51970459
  • However, at the moment is seems env is whitelisted only for CIVICRM_DEBUG_LOG_QUERY

CIVICRM_SMTP_HOST has been added to the preserve-env list.

[frack::puppet] 1d4491d9 Preserve the CIVICRM_SMTP_HOST env for drush

A run of exactly 1 email worked - I can try a thank you run...

Ends up that drush jobs were sending mail via a specific process user. iptables rules adjusted for that user and testing was successful.

We tried this via ENV & it's working - the next step is for @Dwisehaupt & @Jgreen to work through moving traffic over without getting spam listed. Probably having multiple scheduled jobs with different env CIVICRM_SMTP_HOST= .... in them would be the easiest way to tune.

Moving to DONE as the scope of this job is complete & the ongoing work probably has or should have it's own task

In terms of fixing ALL CiviCRM mail to go via this. I have logged an upstream task around moving CiviCRM Mail forwards - https://lab.civicrm.org/dev/core/-/issues/2159 - I'm inclined to let that play out over the next month or 2 & next year revisit it as part of our 'move modules to extensions' metaproject

Change 637898 abandoned by Eileen:

[wikimedia/fundraising/crm@master] WIP on sending phpmailer via civi

Reason:

https://gerrit.wikimedia.org/r/637898