Page MenuHomePhabricator

Have a way to monitor the crash rate and alert for sharp increases
Closed, DeclinedPublic

Description

Follow-up from the unmarshalling crash incident: T144990: [CRASH] Content Service shouldn't send empty objects

We should find a way to monitor crash spikes so that we don't have a situation like T144940 where the outage goes on for hours before we hear about it. Unless I'm missing something, HockeyApp doesn't offer anything like this; the most it will do is email crash reports as they come in.

I can imagine standing up a simple service, maybe in tool labs, that would receive crash events at the same time as they're sent to HockeyApp and alert us however we want if the rate increases above a certain threshold.

Other ideas?

Event Timeline

I think Grafana provides email alerts. If T117378 is implemented, this would be easy.

Hi @Mholloway @Dbrant @Niedzielski ,

Have you tried exploring Crashlytics yet?

We've been using it in our codebase (>50 modules and around 0.13 million users) for more than 2 years now and it has been really helpful. Comes at zero cost and provides a lot of features such as alerts, attaching a good amount of useful information with each crash (or even non-fatal errors) and offering various other crash insights.

If you guys like it, I am ready to volunteer for integrating it in the current android app :)

Thanks, but HockeyApp has been perfectly adequate so far (aside from spike alerts). Switching to another provider for crash logging would necessitate a legal/privacy review, for which we currently don't have the bandwidth or necessity. As for general analytics, we prefer (and are basically required to) keep them in-house.