Page MenuHomePhabricator

Add email queueing/failover to services currently using mail_smarthost[0]
Closed, ResolvedPublic

Description

Several services are configured by puppet with a mail server of mail_smarthost[0]. This selects only the first mail server from a per-site ordered list of mail servers (defined in manifests/realm.pp), and results in an SPOF.

To address this I propose configuring these services to use the exim localhost SMTP listener. This moves the SPOF to the local MTA which has no external networking or service dependencies beyond the locally running daemon. The local MTA then handles relay to the appropriate smarthost with more robust queueing and failover.

This is a follow-up from T196598: Phab and Gerrit emails stopped at around 1900 UTC 6th June.

Event Timeline

Joe triaged this task as High priority.Jun 18 2018, 10:48 AM

Change 440970 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] gerrit: use localhost exim as smtp server

https://gerrit.wikimedia.org/r/440970

Change 441130 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] hue: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441130

Change 441131 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] iegreview: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441131

Change 441132 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] oozie: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441132

Change 441133 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] wikimania_scholarships: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441133

Change 441134 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] sentry: change EMAIL_HOST to localhost

https://gerrit.wikimedia.org/r/441134

Change 441135 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] wikidump: change smtpserver to localhost

https://gerrit.wikimedia.org/r/441135

ArielGlenn renamed this task from b9aaaaaaaa to Add email queueing/failover to services currently using mail_smarthost[0].Jul 1 2018, 6:22 AM
ArielGlenn updated the task description. (Show Details)
ArielGlenn added subscribers: Aklapper, GerritBot.

Change 440970 merged by Herron:
[operations/puppet@production] gerrit: use localhost exim as smtp server

https://gerrit.wikimedia.org/r/440970

Change 441130 merged by Elukey:
[operations/puppet@production] hue: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441130

Mentioned in SAL (#wikimedia-operations) [2018-07-09T06:25:00Z] <elukey> restart hue on thorium to pick up new smtp changes - T196920

Change 441135 merged by Herron:
[operations/puppet@production] wikidump: change smtpserver to localhost

https://gerrit.wikimedia.org/r/441135

Change 441134 merged by Herron:
[operations/puppet@production] sentry: change EMAIL_HOST to localhost

https://gerrit.wikimedia.org/r/441134

Change 441133 merged by Herron:
[operations/puppet@production] wikimania_scholarships: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441133

Change 441131 merged by Herron:
[operations/puppet@production] iegreview: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441131

Change 441132 merged by Elukey:
[operations/puppet@production] oozie: change smtp_host to localhost

https://gerrit.wikimedia.org/r/441132

herron claimed this task.

The last service using mail_smarthost[0] was migrated to the localhost smtp listener today. Resolving.