Page MenuHomePhabricator

discourse-mediawiki.wmflabs.org and discourse.wmflabs.org send emails only to @wikimedia.org addresses
Closed, ResolvedPublic

Description

Since 2018-11-13, https://discourse-mediawiki.wmflabs.org/ and discourse.wmflabs.org are sending emails only to @wikimedia.org addresses. Anything sent to any other domain is skipped with "550 Relay not permitted".

Can this be related to T208830? Or has anything changed in mx1001.wikimedia.org?

These are the current settings:

Delivery Method	
address	                 mx1001.wikimedia.org
port	                 25
authentication	         plain
enable_starttls_auto	 true

Event Timeline

Qgil created this task.Dec 30 2018, 9:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 30 2018, 9:14 PM
Qgil added a comment.Dec 30 2018, 9:18 PM

I just checked the logs of https://discourse.wmflabs.org/ and the problem is the same there. The last non-wikimedia.org email sent was on Nov 12 to @Samwilson's personal email address, and it was a test likely related to T208830...

Qgil added a comment.Dec 30 2018, 10:20 PM

OK @Framawiki, this helps a lot. T41785#4743974 shows a patch committed on Nov 13 and since then our Discourse instances have been out of sync. Now, I have tried to understand what needs to be changed. After reading all the discussion and even the changes in Gerrit (because I couldn't find any info in Wikitech.w.o)... I have rebuild https://discourse-mediawiki.wmflabs.org/ with this configuration:

Delivery Method	
address	                         mx-out01.wmflabs.org
port	                         25
authentication	                 plain
enable_starttls_auto	         true

Now the "improvement" is that we are not discriminating anyone. If before only @wikimedia.org addresses would get email, now nobody gets email from discourse-mediawiki. ;)

Qgil edited projects, added cloud-services-team; removed Mail.Dec 30 2018, 10:30 PM
Qgil removed subscribers: Samwilson, EBernhardson.
bd808 added a subscriber: bd808.Dec 31 2018, 12:15 AM

@Qgil, mx-out01.wmflabs.org and/or mx-out02.wmflabs.org should work as smart relay hosts. I just double checked that from an arbitrary host (striker-deploy04.striker.eqiad.wmflabs):

$ telnet mx-out02.wmflabs.org 25
Trying 172.16.1.238...
Connected to mx-out02.wmflabs.org.
Escape character is '^]'.
220 mx-out02.wmflabs.org ESMTP Exim 4.89 Mon, 31 Dec 2018 00:10:53 +0000
mail from:bd808@striker-deploy04.striker.eqiad.wmflabs
250 OK
rcpt to:bdavis@wikimedia.org
250 Accepted
data
354 Enter message, ending with "." on a line by itself
To: bdavis@wikimedia.org
From: bd808@striker-deploy04.striker.eqiad.wmflabs
Subject: Testing outbound mail from eqiad1-r instance

This is a test email sent via the mx-out02.wmflabs.org smarthost.
.
250 OK id=1gdlAk-0005g2-4V
quit
221 mx-out02.wmflabs.org closing connection
Connection closed by foreign host.

The email was delivered to my Foundation gmail account as expected.

Received: from mx1001.wikimedia.org (mx1001.wikimedia.org. [208.80.154.76])
        by mx.google.com with ESMTPS id n66si1620059qkb.46.2018.12.30.16.11.22
        for <bdavis@wikimedia.org>
        (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
        Sun, 30 Dec 2018 16:11:22 -0800 (PST)
Received: from [172.16.1.238] (port=54398 helo=mx-out02.wmflabs.org) by mx1001.wikimedia.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <root@wmflabs.org>) id 1gdlAw-00059l-4A for bdavis@wikimedia.org; Mon, 31 Dec 2018 00:11:22 +0000
Received: from striker-deploy04.striker.eqiad.wmflabs ([172.16.2.209]:56836) by mx-out02.wmflabs.org with smtp (Exim 4.89) (envelope-from <root@wmflabs.org>) id 1gdlAk-0005g2-4V for bdavis@wikimedia.org; Mon, 31 Dec 2018 00:11:22 +0000
Qgil added a comment.Dec 31 2018, 8:22 AM

Meanwhile in Discourse... There is a long queue of pending email, all of them reporting

Jobs::HandledExceptionWrapper: Wrapped Net::SMTPServerBusy: 454 TLS currently unavailable

While waiting for an informed opinion, I will just blindly test mx-out02.wmflabs.org to see whether it makes any difference (knowing that it probably won't).

Qgil added a comment.EditedDec 31 2018, 8:46 AM

I will just blindly test mx-out02.wmflabs.org

Same results. Same error message in the logs.

Looking in the internets, there seems to be a correlation between 454 TLS currently unavailable and SSL problems. No idea what to do with this.

I am reverting to mx1001.wikimedia.org for now. At least some email is delivered and the rest is skipped without making the error log queue grow.

454 TLS currently unavailable

Explicitly asking: When failing to send an email to a non-wmf address, is there anything in /var/log/exim4/mainlog (or whatever it is in Debian)?

Qgil added a comment.EditedJan 1 2019, 12:04 AM

(Removed, irrelevant)

Qgil added a comment.Jan 1 2019, 12:18 AM

Never mind, Discourse is running in Docker containers, and the mail action is happening somewhere else. This is the backtrace provided by the Discourse logs for this error 454 TLS currently unavailable:

/usr/local/lib/ruby/2.5.0/net/smtp.rb:969:in `check_response'
/usr/local/lib/ruby/2.5.0/net/smtp.rb:937:in `getok'
/usr/local/lib/ruby/2.5.0/net/smtp.rb:822:in `starttls'
/usr/local/lib/ruby/2.5.0/net/smtp.rb:560:in `do_start'
/usr/local/lib/ruby/2.5.0/net/smtp.rb:518:in `start'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mail-2.7.1.rc1/lib/mail/network/delivery_methods/smtp.rb:109:in `start_smtp_session'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mail-2.7.1.rc1/lib/mail/network/delivery_methods/smtp.rb:100:in `deliver!'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mail-2.7.1.rc1/lib/mail/message.rb:2159:in `do_delivery'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mail-2.7.1.rc1/lib/mail/message.rb:260:in `block in deliver'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/actionmailer-5.2.2/lib/action_mailer/base.rb:560:in `block in deliver_mail'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2/lib/active_support/notifications.rb:168:in `block in instrument'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2/lib/active_support/notifications/instrumenter.rb:23:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2/lib/active_support/notifications.rb:168:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/actionmailer-5.2.2/lib/action_mailer/base.rb:558:in `deliver_mail'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mail-2.7.1.rc1/lib/mail/message.rb:260:in `deliver'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/actionmailer-5.2.2/lib/action_mailer/message_delivery.rb:114:in `block in deliver_now'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/actionmailer-5.2.2/lib/action_mailer/rescuable.rb:17:in `handle_exceptions'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/actionmailer-5.2.2/lib/action_mailer/message_delivery.rb:113:in `deliver_now'
/var/www/discourse/lib/email/sender.rb:197:in `send'
/var/www/discourse/app/jobs/regular/user_email.rb:57:in `execute'
/var/www/discourse/app/jobs/base.rb:137:in `block (2 levels) in perform'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/rails_multisite-2.0.4/lib/rails_multisite/connection_management.rb:63:in `with_connection'
/var/www/discourse/app/jobs/base.rb:127:in `block in perform'
/var/www/discourse/app/jobs/base.rb:123:in `each'
/var/www/discourse/app/jobs/base.rb:123:in `perform'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:187:in `execute_job'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:169:in `block (2 levels) in process'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/middleware/chain.rb:128:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:81:in `call'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/middleware/chain.rb:133:in `invoke'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:168:in `block in process'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:139:in `block (6 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/job_retry.rb:98:in `local'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:138:in `block (5 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq.rb:36:in `block in <module:Sidekiq>'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:134:in `block (4 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:199:in `stats'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:129:in `block (3 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/job_logger.rb:8:in `call'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:128:in `block (2 levels) in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/job_retry.rb:73:in `global'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:127:in `block in dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/logging.rb:48:in `with_context'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/logging.rb:42:in `with_job_hash_context'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:126:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:167:in `process'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:85:in `process_one'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/processor.rb:73:in `run'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/util.rb:16:in `watchdog'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.1.3/lib/sidekiq/util.rb:25:in `block in safe_thread'
Qgil added a comment.Jan 1 2019, 12:43 AM

Thanks to @bd808 I have learned that TLS setup is currently broken for mx-out0[12] servers. I have set enable_starttls_auto to false, I have sent a test email t my private email address and I have received the test successfully.

I will wait and see whether the dozens of users who haven't got any email start receiving them again. When that queue is clean and working, I will apply the same change in https://discourse.wmflabs.org

Qgil claimed this task.Jan 1 2019, 12:43 AM
Qgil triaged this task as Normal priority.
herron added a subscriber: herron.Jan 3 2019, 7:11 PM

Is there any improvement here now that T212736 is resolved?

Qgil renamed this task from discourse-mediawiki.wmflabs.org sends emails only to @wikimedia.org addresses to discourse-mediawiki.wmflabs.org and discourse.wmflabs.org send emails only to @wikimedia.org addresses.Jan 4 2019, 10:09 PM
Qgil updated the task description. (Show Details)
Qgil closed this task as Resolved.Jan 4 2019, 11:49 PM

Thanks to T212736 and T212760 being fixed, now both Discourse instances are updated and sending proper email notifications to all users asking for them.