Page MenuHomePhabricator

Monitor mailman outbound mail queue
Closed, ResolvedPublic

Description

We currently monitor "in", "bounces", and "virgin". We should also monitor "out".

Event Timeline

colewhite triaged this task as Medium priority.

Change 546260 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] prometheus, profile: add file count feature and enable lists queue tracking

https://gerrit.wikimedia.org/r/546260

Historically, out queue monitoring has been noisy. One idea to have less noisy outbound monitoring is to take the queue depth and estimate how long it will take to send that queue based on the average send time.

Change 546290 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] mtail,profile: add smtp metrics collection with mtail

https://gerrit.wikimedia.org/r/546290

Change 546992 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: get exim metrics from lists

https://gerrit.wikimedia.org/r/546992

Change 546260 merged by Cwhite:
[operations/puppet@production] prometheus, profile: add file count feature and enable lists queue tracking

https://gerrit.wikimedia.org/r/546260

Change 547028 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] prometheus: update file count script to use single metric instance

https://gerrit.wikimedia.org/r/547028

Change 547028 merged by Cwhite:
[operations/puppet@production] prometheus: update file count script to use single metric instance

https://gerrit.wikimedia.org/r/547028

Change 546290 merged by Cwhite:
[operations/puppet@production] mtail,profile: add smtp metrics collection with mtail

https://gerrit.wikimedia.org/r/546290

Change 549179 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] prometheus: add lists server mtail scrape to mtail jobs

https://gerrit.wikimedia.org/r/549179

Change 546992 merged by Cwhite:
[operations/puppet@production] profile: get exim metrics from lists

https://gerrit.wikimedia.org/r/546992

Change 549179 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: add lists server mtail scrape to mtail jobs

https://gerrit.wikimedia.org/r/549179

Change 553147 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] hiera: set mtail to run as group adm on lists

https://gerrit.wikimedia.org/r/553147

Change 553147 merged by Cwhite:
[operations/puppet@production] hiera: set mtail to run as group adm on lists

https://gerrit.wikimedia.org/r/553147

Change 564129 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] mtail: track new subscription requests in prometheus

https://gerrit.wikimedia.org/r/564129

Change 564129 merged by Cwhite:
[operations/puppet@production] mtail: track new subscription requests in prometheus

https://gerrit.wikimedia.org/r/564129

Change 596471 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add mailman outbound queue monitoring

https://gerrit.wikimedia.org/r/596471

Change 596471 merged by Cwhite:
[operations/puppet@production] profile: add mailman outbound queue monitoring

https://gerrit.wikimedia.org/r/596471

Change 596507 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: check_prometheus expects int

https://gerrit.wikimedia.org/r/596507

Change 596507 merged by Cwhite:
[operations/puppet@production] profile: check_prometheus expects int

https://gerrit.wikimedia.org/r/596507

Change 596517 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add anchor to mailman monitoring section

https://gerrit.wikimedia.org/r/596517

Change 596517 merged by Cwhite:
[operations/puppet@production] profile: add anchor to mailman monitoring section

https://gerrit.wikimedia.org/r/596517

Monitoring deployed and updated some docs as well.