To better visualize utilization and potential issues let's create graphs of mail volume on a per-service level. By service I mean Phabricator, Gerrit, etc.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T197171 Graph outbound mail volume on per-service or hostgroup level | |||
Open | None | T197173 Ship MX logs to ELK |
Event Timeline
Comment Actions
Implementation-wise, parsing MX logs seems like a good bet. For starters we could approximate services through a combination of host classes/groups and envelope from address patterns.
Comment Actions
@fgiunchedi I linked this to T197173 (which I think is worthwhile thing to do anyway) but do you think there is a better approach for this use case?
Comment Actions
The more immediate action would be to deploy mtail to mx servers and write a few rules to munge interesting logs into metrics, and alert/dashboard on those
Comment Actions
Removing SRE, apparently observability has been working on it. That being said, it's almost 5 years that the task hasn't seen any update, I 'd suggest just resolving it.