In support of T310598.
Let's audit all existing alerts and on-call schedules with the following goals in mind:
- Improve paging by prioritizing team members who are awake
- Prioritize alerts to ensure pages only occur for critical events.
- Remove unneeded alerts. Move informational alerts to automated tickets, rather than pages via https://phabricator.wikimedia.org/p/phaultfinder/
- Make alertmanager the single source of truth and interface for visualizing and responding to alerts