Page MenuHomePhabricator

Determine why Exim BDAT error messages are not counted as bounces by mtail
Closed, ResolvedPublicBUG REPORT

Description

The following bounces were bounced with a 503 error code, but our mtail monitoring looks for log lines matching **. Is there a way to monitor these types of bounces?

mainlog.16.gz:2022-05-09 11:13:00 SMTP protocol error in "BDAT 18962 LAST" H=mail-lj1-x229.google.com [2a00:1450:4864:20::229]:45959 I=[2620:0:861:3:208:80:154:76]:25 BDAT command used when CHUNKING not advertised
mainlog.16.gz-2022-05-09 11:13:00 SMTP syntax error in "Received: by mail-lj1-x229.google.com with SMTP id bx33so5704826ljb.12" H=mail-lj1-x229.google.com [2a00:1450:4864:20::229]:45959 I=[2620:0:861:3:208:80:154:76]:25 unrecognized command
mainlog.16.gz-2022-05-09 11:13:00 SMTP syntax error in "        for <jsoby@wikimedia.org>; Mon, 09 May 2022 04:13:00 -0700 (PDT)" H=mail-lj1-x229.google.com [2a00:1450:4864:20::229]:45959 I=[2620:0:861:3:208:80:154:76]:25 unrecognized command
mainlog.16.gz-2022-05-09 11:13:00 SMTP syntax error in "X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;" H=mail-lj1-x229.google.com [2a00:1450:4864:20::229]:45959 I=[2620:0:861:3:208:80:154:76]:25 unrecognized command
mainlog.16.gz-2022-05-09 11:13:00 SMTP call from mail-lj1-x229.google.com [2a00:1450:4864:20::229]:45959 I=[2620:0:861:3:208:80:154:76]:25 dropped: too many syntax or protocol errors (last command was "X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;",  C=EHLO,STARTTLS,EHLO,MAIL,RCPT,BDAT,RSET,NOOP,MAIL,RCPT,BDAT)

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Bounces relating to these 503s should/would originate from the sending mail system as opposed to the wikimedia mx since these errored out midway through the incoming SMTP session, and the connection was closed before the message was accepted into queue of the mx.

FWIW mtail captures a generic SMTP error metric using the below pattern, which should match some of these log lines, although at a glance I don't see the increase that I'd expect to see on the dates of this incident

# ./modules/mtail/files/programs/exim.mtail
@tstamp {
  / (SMTP protocol (synchronization )?error|unexpected disconnection|SMTP command timeout|SMTP syntax error)/ {
    exim_smtp_errors_total++
  }

Bounces relating to these 503s should/would originate from the sending mail system as opposed to the wikimedia mx since these errored out midway through the incoming SMTP session, and the connection was closed before the message was accepted into queue of the mx.

Thanks @herron for the explanation that makes much more sense and explains the difference in Exim's logging.

@herron I did a little log analysis and from what I can tell the errors from this incident were drowned out by the continual errors we get that match the mtail regex. I am not sure we can improve our detection of this type of event by tweaking these rules, so I am going to close this ticket for now, as we at least understand the cause.