Page MenuHomePhabricator

Ensure Jenkins mail configuration supports outbound smtp server failover
Closed, ResolvedPublic

Description

In preparation for MX upgrades I'd like to ensure that Jenkins will not be negatively affected by mx1001 downtime.

What does the current outbound smtp config look like in Jenkins?

If smarthost redundancy is not already in place we have a straightforward (and imo preferred) option to use the localhost exim listener as the outbound smtp server. This listener was deployed to provide local queueing and failover between mx1001 and mx2001 for services that do not support this type of failover natively. FWIW many services (Gerrit, Phabricator, etc.) are configured to relay mail in this way.

Event Timeline

What does the current outbound smtp config look like in Jenkins?

I believe it's just stored in the Jenkins settings: https://integration.wikimedia.org/ci/configure?

Screenshot_2018-09-05 Configure System [Jenkins](1).png (919×2 px, 73 KB)

https://wiki.jenkins.io/display/JENKINS/Mailer has some more details about the plugin.

Thanks! Looks like it would indeed be affected by mx1001 downtime. We should set this to use localhost and let local exim handle smtp failover. Who can review and make that change?

IIRC, a while ago (like in 2012) it was configured to use localhost for relay but eventually we moved to mail.wikimedia.org and then to mx1001.wikimedia.org.

The mail host is not maintained in puppet which is effectively shadow it from SRE operation. It is indeed manually filled at https://integration.wikimedia.org/ci/configure

I am definitely a fan of having Jenkins to point at localhost and let the base system / puppet to handle the mail configuration for us. That seems also easier for an operations point of view.

@herron , I think you are west coast based? You can sync with one of Release-Engineering-Team US members :]

Sorry I forgot, @herron is there a smarthost on all of our servers or does that need to be added via a puppet profile? We have Jenkins instances on the following hosts:

contint1001CI Jenkins master
contint2001Spare for CI
releases1001Release Jenkins
releases2001Hot spare for release jenkins

I am definitely a fan of having Jenkins to point at localhost and let the base system / puppet to handle the mail configuration for us. That seems also easier for an operations point of view.

Excellent! When do you think we could make the config change?

@herron , I think you are west coast based? You can sync with one of Release-Engineering-Team US members :]

I'm on the east cost so I think our working hours overlap a bit. Happy to sync with US members as well of course too if that's preferred.

@herron is there a smarthost on all of our servers or does that need to be added via a puppet profile?

Yes, there is now a local smarthost running on all servers from patch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/429456/

hashar@contint1001:~$ nc 127.0.0.1 25
220 contint1001.wikimedia.org ESMTP Exim 4.84_2 Fri, 07 Sep 2018 16:14:11 +0000
MAIL FROM: jenkins-bot@wikimedia.org
250 OK
RCPT TO: hashar@free.fr
250 Accepted
DATA
354 Enter message, ending with "." on a line by itself
Subject: does it work?

That seems all good. 

Antoine

.
250 OK id=1fyJOx-00005o-TE
quit
221 contint1001.wikimedia.org closing connection
2018-09-07 16:14:45 1fyJOx-00005o-TE <= jenkins-bot@wikimedia.org H=[127.0.0.1]:44498 I=[127.0.0.1]:25 P=smtp S=268
2018-09-07 16:14:45 1fyJOx-00005o-TE => hashar@free.fr R=smart_route T=remote_smtp S=282 H=mx1001.wikimedia.org [2620:0:861:3:208:80:154:76] C="250 OK id=1fyJPB-0003lI-FM" DT=0s
2018-09-07 16:14:45 1fyJOx-00005o-TE Completed

And I got an email:

Return-Path: jenkins-bot@wikimedia.org
...
Received: from mx1001.wikimedia.org (mx26-g26.priv.proxad.net [172.20.243.96])
    by zimbra31-e6.priv.proxad.net (Postfix) with ESMTP id 04B4869028C
    for <hashar@free.fr>; Fri,  7 Sep 2018 18:14:46 +0200 (CEST)
Received: from mx1001.wikimedia.org ([208.80.154.76])
    by mx1-g20.free.fr (MXproxy) with ESMTPS for hashar@free.fr
    (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128);
    Fri,  7 Sep 2018 18:14:48 +0200 (CEST) 
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=wikimedia.org; s=wikimedia;
    h=Date:From:Message-Id:Subject; bh=zROnx6K8MGhfzMeSWd0o0Si3omJfOL01ifiI41RHwuo=;
    b=G77JCUNLBY7WkHzcN/tRTnDeLKMcDbq8G64NvV58a7XbjYznGN1CcesyxVW/ls2cGquCUmcUuNdnq3UsPHnEQxPj8Zp3m1iAQqPdb3JWnGOs68s8Sln4wOXSr6wx4BUme3eyQWh4e3lJsH0VuRC2nLAsMMzU+UNV2IS9DSmfnn4=;
Received: from contint1001.wikimedia.org ([2620:0:861:1:208:80:154:17]:32896)
    by mx1001.wikimedia.org with esmtp (Exim 4.84_2)
    (envelope-from <jenkins-bot@wikimedia.org>)
    id 1fyJPB-0003lI-FM
    for hashar@free.fr; Fri, 07 Sep 2018 16:14:45 +0000
Received: from [127.0.0.1] (port=44498)
    by contint1001.wikimedia.org with smtp (Exim 4.84_2)
    (envelope-from <jenkins-bot@wikimedia.org>)
    id 1fyJOx-00005o-TE
    for hashar@free.fr; Fri, 07 Sep 2018 16:14:45 +0000
Subject: does it work?
Message-Id: <E1fyJPB-0003lI-FM@mx1001.wikimedia.org>
From: jenkins-bot@wikimedia.org
Date: Fri, 07 Sep 2018 16:14:45 +0000
To: undisclosed-recipients:;

That seems all good.

Antoine

So sounds good. Can you poke me on Monday? I will do the configuration update then validate Jenkins is still able to send emails :]

Sounds like a plan! Will ping you Monday

Mentioned in SAL (#wikimedia-operations) [2018-09-10T14:26:03Z] <hashar> Switching CI Jenkins mail server from mx1001 to localhost | T203607

Turns out releases1001 / releases2002 Jenkins do not have email configured.

Jenkins on contint1001 sends email through localhost just fine.

contint2001 is a spare server, Jenkins is not configured there.