Follow-up from the unmarshalling crash incident: T144990: [CRASH] Content Service shouldn't send empty objects
We should find a way to monitor crash spikes so that we don't have a situation like T144940 where the outage goes on for hours before we hear about it. Unless I'm missing something, HockeyApp doesn't offer anything like this; the most it will do is email crash reports as they come in.
I can imagine standing up a simple service, maybe in tool labs, that would receive crash events at the same time as they're sent to HockeyApp and alert us however we want if the rate increases above a certain threshold.
Other ideas?