This should include:
- Retrieving metrics on prometheus side (if there's anything missing)
- Add alerts for "down" events - with runbooks
- Add a basic grafana board with the "up/down" metric to add as 'dashboard' to the alerts
- jobs-api
- gather stats
- add alert
- jobs-emailer
- gather stats
- add alerts