|Resolved||Dzahn||T297017 MX record issue on mx2001.wikimedia.org|
|Resolved||herron||T297127 Incident: 2021-12-03 mx2001->Gmail delivery issues|
|Resolved||herron||T297144 large MX queues should page|
|Open||None||T275867 Add exim queue size to grafana graph|
|Open||None||T294166 Alert that should have paged via VictorOps was delayed because of partial networking outage|
- Mentioned Here
- T294166: Alert that should have paged via VictorOps was delayed because of partial networking outage
T275867: Add exim queue size to grafana graph
T253733: Icinga stopped sending emails
T297017: MX record issue on mx2001.wikimedia.org
T297127: Incident: 2021-12-03 mx2001->Gmail delivery issues
deep link to existing Icinga check:
As you can see there the current threshold for alerting is 2000 per "OK: Less than 2000 mails in exim queue.".
But here alerting means only IRC output. (Should it mean email is sent? T253733?) and either way it does not currently page (so , no "critical => true" is set regardless what we set as treshold).
I know the task description says "threshold to be determined" but calling more attention to the current check would have helped in the related incident case. So that check is now paging, and we can continue to tune/adjust/improve the monitoring and thresholds via the related tasks.