Today I checked a shinken email alert * PROBLEM alert - tools-mail/exim queue length is WARNING **.
at tools-mail.eqiad.wmflabs, I saw the exim queue is rather big, full of bounce emails from prometheus@tools.wmflabs.org:
aborrero@tools-mail:~$ sudo exim -bpc 1128 aborrero@tools-mail:~$ sudo exim -bp [...] 48h 2.3K 1fNuq2-0002j2-3q <> *** frozen *** prometheus@tools.wmflabs.org 48h 2.3K 1fNur1-0002jm-7u <> *** frozen *** prometheus@tools.wmflabs.org 48h 2.3K 1fNurx-0002kx-TG <> *** frozen *** prometheus@tools.wmflabs.org 48h 2.3K 1fNusw-0002lW-JG <> *** frozen *** prometheus@tools.wmflabs.org [....]
Some of the emails are:
aborrero@tools-mail:~$ sudo exim -Mvb 1fOdvq-00079D-R9 1fOdvq-00079D-R9-D This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: prometheus@tools.wmflabs.org Unrouteable address ------ This is a copy of the message, including all the headers. ------ Return-path: <prometheus@tools.wmflabs.org> Received: from tools-bastion-03.tools.eqiad.wmflabs ([10.68.23.58] ident=Debian-exim) by mail.tools.wmflabs.org with esmtp (Exim 4.82) (envelope-from <prometheus@tools.wmflabs.org>) id 1fOdvq-000797-9C for prometheus@tools.wmflabs.org; Fri, 01 Jun 2018 06:53:02 +0000 Received: from prometheus by tools-bastion-03.tools.eqiad.wmflabs with local (Exim 4.82) (envelope-from <prometheus@tools.wmflabs.org>) id 1fOdvq-0007WV-6v for prometheus@tools.wmflabs.org; Fri, 01 Jun 2018 06:53:02 +0000 From: root@tools.wmflabs.org (Cron Daemon) To: prometheus@tools.wmflabs.org Subject: Cron <prometheus@tools-bastion-03> /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom Content-Type: text/plain; charset=ANSI_X3.4-1968 X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <HOME=/var/lib/prometheus> X-Cron-Env: <PATH=/usr/bin:/bin> X-Cron-Env: <LOGNAME=prometheus> Message-Id: <E1fOdvq-0007WV-6v@tools-bastion-03.tools.eqiad.wmflabs> Date: Fri, 01 Jun 2018 06:53:02 +0000 Traceback (most recent call last): File "/usr/local/bin/prometheus-puppet-agent-stats", line 117, in <module> sys.exit(main()) File "/usr/local/bin/prometheus-puppet-agent-stats", line 111, in main write_to_textfile(args.outfile, registry) File "/usr/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 115, in write_to_textfile with open(tmppath, 'wb') as f: IOError: [Errno 13] Permission denied: u'/var/lib/prometheus/node.d/puppet_agent.prom.28919.139863252813632'
Also, not sure why this is happening:
From: root@tools.wmflabs.org (Cron Daemon) To: prometheus@tools.wmflabs.org