What didn't work well
- 6 min downtime due to a mistake in message group configuration
What worked well
- Alerting worked
How were we lucky:
- Abijeet was around and was able to quickly determine cause and fix it
Action items:
- Check that all deployers are subscribed to uptime alerts
- Config updates are applied immediately – consider requiring them to be deployed too
- Lack of automated CI testing for message group configuration, we'd like to have this