Page MenuHomePhabricator

Create new job to send Civi thank you emails over smtp through frmx hosts
Closed, ResolvedPublic

Description

We have verified that we can send emails over smtp in T264663. Now we need to enable some jobs to do so.

  • create two more thank you email job with updated CIVICRM_SMTP_HOST= settings to add the frmx hosts, one job will have frmx1001 first and the other will have frmx2001 first.
  • start new jobs running once and hour (2 past for frmx1001 and 5 past for frmx2001) while reducing the frequency of the current job so they don't overlap
  • monitor mail logs for any bounce/reject issues on the frmx hosts
  • slowly increase the frequency of the new jobs while decreasing the current job

Definition of done:

  • All thank you emails being sent out through frmx hosts.
  • No substantial increase in delay, bounce, or rejection of messages

Event Timeline

Dwisehaupt renamed this task from Create new job to send Civi thank you emails over stmp through frmx hosts to Create new job to send Civi thank you emails over smtp through frmx hosts.Nov 4 2020, 7:07 PM

After some discussion we have decided to split this into two new send mail scripts per civi host. The reason for this is that if we just use one with the static ENV of frmx1001:frmx2001, as long as frmx1001 is up, frmx2001 will never see any traffic. by using two scripts we can order the preference for each differently and have them take alternating slots of requests.

As we roll this out, the new scripts will start with a single slot in the cron rotation (frmx1001 at 2, frmx2001 at 5) and the standard job will take the remaining (8-59/3). As we want to ramp up the volume, we will just add more slots into each of the frmx's and remove them from the current script.

This will add some complexity in terms of the amount of things to track but will give us greater flexibility and hopefully more resiliency with 2 warm frmxs.

Updating the task to reflect this.

Pushed these jobs this morning and seeing mails going through the frmxs. Leaving them running on this minimal setting for the weekend to gather more data before expanding the run times.

jobs updated to handle 30% of job runs. Continuing with monitoring of logs.

Jobs updated to handle 60% of the runs. Continuing with monitoring of logs.

So far we only had some issues a week ago with verizon/yahoo/aol slowing down our deliveries. That cleared on 20201112 and we have seen no further issues to this point.

Hit a new error case to track today. We had a domain we were checking shift to not providing rdns entries. In this case, we were getting valid IPs back, but the set to apply was empty. This caused repeated emails about an abandoned set. We should see if it's possible to catch this and pass on any changes if the temp list has no IPs instead of applying the empty list.

Mentioned in SAL (#wikimedia-fundraising) [2020-11-20T18:14:34Z] <dwisehaupt> shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259

Dwisehaupt claimed this task.
Dwisehaupt updated the task description. (Show Details)
Dwisehaupt moved this task from In Progress to Done on the fundraising-tech-ops board.

Fully transitioned and logs are still looking clean through the latest test.