We should collect some basic postfix metrics to help us spot trouble, at least queue size and message delivery/failure rates.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Jgreen | T91508 [Epic] overhaul fundraising cluster monitoring | |||
Resolved | • cwdent | T152562 Port fundraising stats off Ganglia | |||
Resolved | • cwdent | T176495 prometheus collector or exporter for postfix metrics |
Event Timeline
Comment Actions
For posterity, we now have delivery rates added by log-scraping:
class { 'prometheus::collector::syslog': jobs => { '/var/log/mail.log' => { 'mail_bounced' => 'postfix\/smtp.*, status=bounced', 'mail_deferred' => 'postfix\/smtp.*, status=deferred', 'mail_expired' => 'postfix\/qmgr.*, status=expired', 'mail_sent' => 'postfix\/smtp.*, status=sent', }, }, }
Comment Actions
Timing-wise, it looks like this broke on Jul 12, 2017 between 15:59:01 and 16:02:29 UTC.