This has been on the ops todo list for a while but we haven't quite gotten to it.
We want something with the following characteristics:
- Lightweight, easy to load
- Probably a static file that's updated periodically
- Will show current and recent past state with automatic downtime detection of key services
- Ability for admins to easily add annotations for particular events
We can then link this prominently from our error messages (replacing the old "go check in IRC" links), from the tech blog, etc.
Might want to have it hosted separately from primary sites, but if so we need to make sure it'll handle the traffic in a downtime. ;) Hosting on a standalone server within Tampa, with a backup in Amsterdam, would probably be an acceptable compromise for now.