Page MenuHomePhabricator

Beta Cluster mailer not sending emails
Open, NormalPublic

Description

I performed a password reset early today on Beta Cluster. The account has an active and verified email address set. However the password reset email hasn't arrived after a long waiting time. We should check if the mailer is working as expected.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 19 2018, 6:23 PM

@herron - not sure if related to T41785; also @aborrero due to being in wmflabs realm.

What outbound smtp server is currently being used by the beta cluster?

I think I set beta's wikimail stuff to go via deployment-mx02, will look into it

From deployment-mx02:/var/log/exim4/mainlog:
2018-12-19 19:45:50 H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:45122 I=[172.16.4.120]:25 F=<wiki-enwiki-1t-pk01kd-IfxAC4l3++aDeOz1@beta.wmflabs.org> rejected RCPT <krenair@gmail.com>: Relay not permitted

Let's fix root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+16-110) first

Well that's weird, dunno why puppet's autoupdater was stuck while rebasing seemed to work fine:

root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+16-110)# git pull --rebase origin production
From https://gerrit.wikimedia.org/r/p/operations/puppet
 * branch                  production -> FETCH_HEAD
First, rewinding head to replay your work on top of it...
Applying: [WIP] logstash: send errors to sentry
Applying: swift: lower replication interval for beta
Applying: prometheus: make ferm DNS record type configurable
Applying: Hack profile::base::firewall to prevent dupe definition
Applying: Add account for phabricator_files to swift::params::accounts
Applying: Scap: scap_source correct gid
Applying: swift: use implicit /dev/swift prefix for swift devices
Applying: Puppetise simple no-CA class for deployment-dumps-puppetmaster02
Applying: Attempt to secure Puppet DB better
Applying: [LOCAL HACK] tls certs for deployment-elastic*
Applying: Move declaration of diamond package out of diamond class
Applying: cumin: Allow Puppet DB backend to be used within Labs projects that use it
Applying: Beta: maintenance: no openldap management
Applying: Re-combine labs and production exim minimal config
Applying: varnish: move $all_networks to $trusted_networks
Applying: logstash: add new logging kafka consumer
root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+16)#

(None of that seemed to have any effect on -mx02.)

Krenair added a comment.EditedDec 19 2018, 8:49 PM

So I think this broke during the eqiad1-r migration.
/etc/exim4/exim4.conf:
hostlist wikimedia_nets = <; 91.198.174.0/24 ; 208.80.152.0/22 ; 2620:0:860::/46 ; 198.35.26.0/23 ; 185.15.56.0/22 ; 2a02:ec80::/32 ; 2001:df2:e500::/48 ; 103.102.166.0/24 ; 10.0.0.0/8
hostlist relay_from_hosts = <; @[] ; 127.0.0.1 ; ::1 ; 91.198.174.0/24 ; 208.80.152.0/22 ; 2620:0:860::/46 ; 198.35.26.0/23 ; 185.15.56.0/22 ; 2a02:ec80::/32 ; 2001:df2:e500::/48 ; 103.102.166.0/24 ; 10.0.0.0/8
Contains 10/8 but not the new range.

hostlist wikimedia_nets = <; <%= scope.lookupvar('network::constants::all_networks').join(" ; ") %>
hostlist relay_from_hosts = <; @[] ; 127.0.0.1 ; ::1 ; <%= scope.lookupvar('network::constants::all_networks').join(" ; ") %>

$external_networks = $network_data['network::external']
$all_networks = flatten([$external_networks, '10.0.0.0/8'])

I think all_networks should add the new range if $realm == 'labs'.

MarcoAurelio renamed this task from Beta Cluster mailer not sending emails apparently to Beta Cluster mailer not sending emails.Dec 20 2018, 9:15 PM

At https://horizon.wikimedia.org/project/instances/41fe8dce-d0bb-424d-9c9a-9dec6dc68362/ (deployment-mx02) I've found:

AppliedNameParams.Actions
Truerole::mail::mxverp_bounce_post_url: 'api-rw.discovery.wmnet/w/api.php'; verp_domains: [ 'wikimedia.org' ]; prometheus_nodes: hiera('prometheus_nodes', []); verp_post_connect_server: 'meta.wikimedia.org'

Not sure if that'd be related with this but verp_post_connect_server: 'meta.wikimedia.org' doesn't look right?

I don't see what that has to do with it

Change 481215 had a related patch set uploaded (by Alex Monk; owner: Bstorm):
[operations/puppet@production] toolforge: add the new cloud region to all_networks

https://gerrit.wikimedia.org/r/481215

Just wanted to chime in and say that the Growth team (@Catrope) would benefit from this fix, since we're trying to test our Help Panel feature in beta cluster (which involves sending and confirming emails).

herron added a comment.Jan 3 2019, 3:31 PM

How about configuring the beta cluster to relay via the cloud/labs smarthosts mx-out0[12].wmflabs.org ?

Change 475714 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Introduce $aggregate_networks, deprecate $all_networks

https://gerrit.wikimedia.org/r/475714

Change 481215 abandoned by Bstorm:
network: Add the new cloud region to all_networks

Reason:
This one is a non-starter in the middle of other refactors

https://gerrit.wikimedia.org/r/481215

Change 475714 merged by Alexandros Kosiaris:
[operations/puppet@production] Introduce $aggregate_networks, deprecate $all_networks

https://gerrit.wikimedia.org/r/475714

Is the patch above solving this incident or do we need further changes so that Beta can send mail again? Thanks.

Re-checked -betalbs still not sending emails.

MarcoAurelio triaged this task as High priority.Mar 1 2019, 9:59 AM

The purpose of Beta (testing software) is being affected.

Krenair lowered the priority of this task from High to Normal.Mar 1 2019, 10:09 AM

Email from beta works, it just mishandles @wikimedia.org addresses. Dropping priority

2019-03-01 10:08:37 1gzf5p-0004SQ-Bm <= wiki-enwiki-1t-pnomud-dWAVN5KAegIR6rAh@beta.wmflabs.org H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:39624 I=[172.16.4.120]:25 P=esmtp S=1575 id=enwiki.5c7904a500d375.37089512@en.wikipedia.beta.wmflabs.org
2019-03-01 10:08:37 1gzf5p-0004SQ-Bm => krenair@gmail.com R=dnslookup T=remote_smtp_signed S=2279 H=gmail-smtp-in.l.google.com [173.194.68.27] I=[172.16.4.120] X=TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256 CV=yes DN="C=US,ST=California,L=Mountain View,O=Google LLC,CN=mx.google.com" C="250 2.0.0 OK  1551434917 y28si461207qvf.34 - gsmtp" DT=0s
2019-03-01 10:08:37 1gzf5p-0004SQ-Bm Completed

vs.:

2019-03-01 10:05:46 H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:35632 I=[172.16.4.120]:25 F=<wiki-enwiki-50n-pnnvxh-ujky0FUwA22N6N9J@beta.wmflabs.org> temporarily rejected RCPT <etonkovidova@wikimedia.org>: failed to bind the LDAP connection to server ldap-corp.codfw.wikimedia.org:389 - ldap_bind() returned -1

Basically our MX is trying to do the special routing for @wikimedia.org addresses (e.g., looking up against the mirror of the foundation's corp LDAP system to see if the user has a google inbox) that should only be done by a prod MX. Our one should just be sending it on to prod.

herron added a comment.Mar 1 2019, 2:29 PM
2019-03-01 10:05:46 H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:35632 I=[172.16.4.120]:25 F=<wiki-enwiki-50n-pnnvxh-ujky0FUwA22N6N9J@beta.wmflabs.org> temporarily rejected RCPT <etonkovidova@wikimedia.org>: failed to bind the LDAP connection to server ldap-corp.codfw.wikimedia.org:389 - ldap_bind() returned -1

Basically our MX is trying to do the special routing for @wikimedia.org addresses (e.g., looking up against the mirror of the foundation's corp LDAP system to see if the user has a google inbox) that should only be done by a prod MX. Our one should just be sending it on to prod.

Makes sense, and this describes in a nutshell the behavior of the labs smarthosts (just sending mail on to prod)

How about configuring the beta cluster to relay via the cloud/labs smarthosts mx-out0[12].wmflabs.org ?

Bumping this question

How about configuring the beta cluster to relay via the cloud/labs smarthosts mx-out0[12].wmflabs.org ?

Bumping this question

I feel like there was some reason there is a separate MX host in beta but don't remember it right now.

Email from beta works, it just mishandles @wikimedia.org addresses. Dropping priority

Thanks! It does work for not @wikimedia.org addresses - it's really great to have it working just in time to check GrowthExperiments on improving user emailability.

DannyS712 moved this task from Unsorted to Others on the User-DannyS712 board.Jul 22 2019, 5:35 AM