I performed a password reset early today on Beta Cluster. The account has an active and verified email address set. However the password reset email hasn't arrived after a long waiting time. We should check if the mailer is working as expected.
Description
Details
Related Objects
Event Timeline
From deployment-mx02:/var/log/exim4/mainlog:
2018-12-19 19:45:50 H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:45122 I=[172.16.4.120]:25 F=<wiki-enwiki-1t-pk01kd-IfxAC4l3++aDeOz1@beta.wmflabs.org> rejected RCPT <krenair@gmail.com>: Relay not permitted
Let's fix root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+16-110) first
Well that's weird, dunno why puppet's autoupdater was stuck while rebasing seemed to work fine:
root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+16-110)# git pull --rebase origin production From https://gerrit.wikimedia.org/r/p/operations/puppet * branch production -> FETCH_HEAD First, rewinding head to replay your work on top of it... Applying: [WIP] logstash: send errors to sentry Applying: swift: lower replication interval for beta Applying: prometheus: make ferm DNS record type configurable Applying: Hack profile::base::firewall to prevent dupe definition Applying: Add account for phabricator_files to swift::params::accounts Applying: Scap: scap_source correct gid Applying: swift: use implicit /dev/swift prefix for swift devices Applying: Puppetise simple no-CA class for deployment-dumps-puppetmaster02 Applying: Attempt to secure Puppet DB better Applying: [LOCAL HACK] tls certs for deployment-elastic* Applying: Move declaration of diamond package out of diamond class Applying: cumin: Allow Puppet DB backend to be used within Labs projects that use it Applying: Beta: maintenance: no openldap management Applying: Re-combine labs and production exim minimal config Applying: varnish: move $all_networks to $trusted_networks Applying: logstash: add new logging kafka consumer root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+16)#
So I think this broke during the eqiad1-r migration.
/etc/exim4/exim4.conf:
hostlist wikimedia_nets = <; 91.198.174.0/24 ; 208.80.152.0/22 ; 2620:0:860::/46 ; 198.35.26.0/23 ; 185.15.56.0/22 ; 2a02:ec80::/32 ; 2001:df2:e500::/48 ; 103.102.166.0/24 ; 10.0.0.0/8
hostlist relay_from_hosts = <; @[] ; 127.0.0.1 ; ::1 ; 91.198.174.0/24 ; 208.80.152.0/22 ; 2620:0:860::/46 ; 198.35.26.0/23 ; 185.15.56.0/22 ; 2a02:ec80::/32 ; 2001:df2:e500::/48 ; 103.102.166.0/24 ; 10.0.0.0/8
Contains 10/8 but not the new range.
hostlist wikimedia_nets = <; <%= scope.lookupvar('network::constants::all_networks').join(" ; ") %>
hostlist relay_from_hosts = <; @[] ; 127.0.0.1 ; ::1 ; <%= scope.lookupvar('network::constants::all_networks').join(" ; ") %>
$external_networks = $network_data['network::external']
$all_networks = flatten([$external_networks, '10.0.0.0/8'])
I think all_networks should add the new range if $realm == 'labs'.
At https://horizon.wikimedia.org/project/instances/41fe8dce-d0bb-424d-9c9a-9dec6dc68362/ (deployment-mx02) I've found:
Applied | Name | Params. | Actions |
---|---|---|---|
True | role::mail::mx | verp_bounce_post_url: 'api-rw.discovery.wmnet/w/api.php'; verp_domains: [ 'wikimedia.org' ]; prometheus_nodes: hiera('prometheus_nodes', []); verp_post_connect_server: 'meta.wikimedia.org' | |
Not sure if that'd be related with this but verp_post_connect_server: 'meta.wikimedia.org' doesn't look right?
Change 481215 had a related patch set uploaded (by Alex Monk; owner: Bstorm):
[operations/puppet@production] toolforge: add the new cloud region to all_networks
Just wanted to chime in and say that the Growth team (@Catrope) would benefit from this fix, since we're trying to test our Help Panel feature in beta cluster (which involves sending and confirming emails).
How about configuring the beta cluster to relay via the cloud/labs smarthosts mx-out0[12].wmflabs.org ?
Change 475714 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Introduce $aggregate_networks, deprecate $all_networks
Change 481215 abandoned by Bstorm:
network: Add the new cloud region to all_networks
Reason:
This one is a non-starter in the middle of other refactors
Change 475714 merged by Alexandros Kosiaris:
[operations/puppet@production] Introduce $aggregate_networks, deprecate $all_networks
Is the patch above solving this incident or do we need further changes so that Beta can send mail again? Thanks.
Email from beta works, it just mishandles @wikimedia.org addresses. Dropping priority
2019-03-01 10:08:37 1gzf5p-0004SQ-Bm <= wiki-enwiki-1t-pnomud-dWAVN5KAegIR6rAh@beta.wmflabs.org H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:39624 I=[172.16.4.120]:25 P=esmtp S=1575 id=enwiki.5c7904a500d375.37089512@en.wikipedia.beta.wmflabs.org 2019-03-01 10:08:37 1gzf5p-0004SQ-Bm => krenair@gmail.com R=dnslookup T=remote_smtp_signed S=2279 H=gmail-smtp-in.l.google.com [173.194.68.27] I=[172.16.4.120] X=TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256 CV=yes DN="C=US,ST=California,L=Mountain View,O=Google LLC,CN=mx.google.com" C="250 2.0.0 OK 1551434917 y28si461207qvf.34 - gsmtp" DT=0s 2019-03-01 10:08:37 1gzf5p-0004SQ-Bm Completed
vs.:
2019-03-01 10:05:46 H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:35632 I=[172.16.4.120]:25 F=<wiki-enwiki-50n-pnnvxh-ujky0FUwA22N6N9J@beta.wmflabs.org> temporarily rejected RCPT <etonkovidova@wikimedia.org>: failed to bind the LDAP connection to server ldap-corp.codfw.wikimedia.org:389 - ldap_bind() returned -1
Basically our MX is trying to do the special routing for @wikimedia.org addresses (e.g., looking up against the mirror of the foundation's corp LDAP system to see if the user has a google inbox) that should only be done by a prod MX. Our one should just be sending it on to prod.
Makes sense, and this describes in a nutshell the behavior of the labs smarthosts (just sending mail on to prod)
Bumping this question
I feel like there was some reason there is a separate MX host in beta but don't remember it right now.
Thanks! It does work for not @wikimedia.org addresses - it's really great to have it working just in time to check GrowthExperiments on improving user emailability.