Page MenuHomePhabricator

Phabricator needs to handle bounces/errors from non-existent email addresses
Open, MediumPublic

Description

Looking at /var/log/phd/daemons.log I found out the following:

[01-Apr-2015 17:30:55 UTC] [2015-04-01 17:30:55] EXCEPTION: (PhutilProxyException) Error while executing task ID 11483554 from queue. {>} (phpmailerException) SMTP Error: The following recipients failed: #REMOVED# at [<phabricator>/externals/phpmailer/class.phpmailer.php:738]
[01-Apr-2015 17:30:55 UTC]   #0 PHPMailer::SmtpSend(string, string) called at [<phabricator>/externals/phpmailer/class.phpmailer.php:576]
[01-Apr-2015 17:30:55 UTC]   #1 PHPMailer::Send() called at [<phabricator>/src/applications/metamta/adapter/PhabricatorMailImplementationPHPMailerAdapter.php:118]
[01-Apr-2015 17:30:55 UTC]   #2 PhabricatorMailImplementationPHPMailerAdapter::send() called at [<phabricator>/src/applications/metamta/storage/PhabricatorMetaMTAMail.php:676]
[01-Apr-2015 17:30:55 UTC]   #3 PhabricatorMetaMTAMail::sendNow() called at [<phabricator>/src/applications/metamta/PhabricatorMetaMTAWorker.php:26]
[01-Apr-2015 17:30:55 UTC]   #4 PhabricatorMetaMTAWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:91]
[01-Apr-2015 17:30:55 UTC]   #5 PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:158]
[01-Apr-2015 17:30:55 UTC]   #6 PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:19]
[01-Apr-2015 17:30:55 UTC]   #7 PhabricatorTaskmasterDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:91]

The #REMOVED# part is right now 4 distinct address but I think they are more (reasoning below). It should be easy to find those 4 with the following command:

grep recipients /var/log/phd/daemons.log | egrep -o '[[:alnum:]]+@[[:alnum:]]+\.[[:alnum:]]+' | sort | uniq -c |sort -rn

I see a couple of issues:

  • How do we handle non existent email addresses in phabricator? phab is trying to send emails to those addresses and is failing.

These addresses are all @wikimedia.org or @mediawiki.org which are WMF owned domains and the relay MTA is authoritative for them so it is directly answering to phabricator with a 550 refusing to accept the message. For the rest, the MTA will accept the message from phabricator and then be refused itself by the corresponding MTA generating a DSN, which probably gets lost. So we probably got more than the 4 reported by the command above.

  • Phab is not handling very nicely the failure. To my phab unfamiliar eye, it seems like the Taskmaster is spewing out a stacktrace (a well formulated one btw) and then dieing when email sending fails. I am thinking it should handle such an occasion more gracefully. Logging it and then terminating normally.

Event Timeline

akosiaris raised the priority of this task from to Needs Triage.
akosiaris updated the task description. (Show Details)
akosiaris added a project: Phabricator.
akosiaris added a subscriber: akosiaris.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 26 2015, 7:35 AM

I think the Phabricator behavior (not handling invalid e-mail addresses gracefully) needs to be filed over at https://secure.phabricator.com. It sounds like Phabricator (Upstream).

Qgil added a subscriber: Qgil.

Adding Phabricator (Upstream) while we figure out what needs to be done exactly.

akosiaris updated the task description. (Show Details)May 27 2015, 7:54 AM
akosiaris set Security to None.
Nemo_bis renamed this task from Phabricator is trying to send emails to non existent email addresses and then errors out, spewing a stack trace to Phabricator needs to handle bounces from non-existent email addresses.May 31 2015, 3:08 PM
Nemo_bis renamed this task from Phabricator needs to handle bounces from non-existent email addresses to Phabricator needs to handle bounces/errors from non-existent email addresses.
Nemo_bis triaged this task as Medium priority.
Nemo_bis awarded a token.

This does seem to be a problem, as there is constantly a backlog of metamta tasks in the queue and a large number of email records in phab's database.

Restricted Application added a subscriber: scfc. · View Herald TranscriptJul 7 2015, 2:51 PM
mmodell moved this task from To Triage to Misc on the Phabricator board.Jul 7 2015, 2:52 PM

This does seem to be a problem, as there is constantly a backlog of metamta tasks in the queue and a large number of email records in phab's database.

I asked upstream on IRC:

<andre> Phabricator's taskmaster seems to spew out a stacktrace and die when trying to send email notifications to non-existent email addresses (bouncing). Do upstream developers agree that it should terminate normally and should I forward https://phabricator.wikimedia.org/T100400 into a task in secure.phabricator.com ?
<epriestley> andre: it should ideally just be a permanent failure; we may have some difficulty distinguishing between temporary and permanent failures from the SMTP mailer in the general case. Feel free to forward it, though.

Anybody willing to create an upstream ticket (and link to it from the task description) who's more skilled/knowledgeable to discuss this specific problem?

Anybody willing to create an upstream ticket (about distinguishing between temporary and permanent failures from the SMTP mailer) who's more skilled/knowledgeable to discuss this specific problem?

Has anyone filed a upstream report about this yet since @Aklapper asked for help in Jul?

Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJan 9 2016, 12:31 AM
demon added a comment.Jan 9 2016, 12:35 AM

Has anyone filed a upstream report about this yet since @Aklapper asked for help in Jul?

I spoke with upstream and part of it was filed as https://secure.phabricator.com/T10105. That'll at least handle the "people trying to verify a bogus e-mail and bouncing endlessly" side. It doesn't handle the "used to be a valid e-mail and now isn't so it keeps bouncing" side though.

Qgil removed a subscriber: Qgil.Jan 10 2016, 8:13 PM
demon added a comment.Jan 11 2016, 6:07 PM

Has anyone filed a upstream report about this yet since @Aklapper asked for help in Jul?

I spoke with upstream and part of it was filed as https://secure.phabricator.com/T10105. That'll at least handle the "people trying to verify a bogus e-mail and bouncing endlessly" side. It doesn't handle the "used to be a valid e-mail and now isn't so it keeps bouncing" side though.

Rephrase, this only fixes it when an account has been disabled (should not send any e-mail). It doesn't handle the overall problem of bounce handling through for a user who's not disabled.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptMay 23 2016, 6:02 PM

After https://secure.phabricator.com/D17344, you can bin/mail unverify <address> to unverify an address which has begun bouncing, and we no longer send (most) mail to unverified addresses.

This isn't as clean as automatically handling bounces, but might be 90% of a solution if the number of problem addresses is only a few dozen.

demon added a comment.Feb 13 2017, 6:59 PM

After https://secure.phabricator.com/D17344, you can bin/mail unverify <address> to unverify an address which has begun bouncing, and we no longer send (most) mail to unverified addresses.
This isn't as clean as automatically handling bounces, but might be 90% of a solution if the number of problem addresses is only a few dozen.

Yes, that would actually solve our problem completely, it's always been just a few addresses causing the issue.

Aklapper moved this task from Backlog to Reported Upstream on the Upstream board.