Page MenuHomePhabricator

Email maintainers about critical errors in production
Closed, ResolvedPublic1 Estimated Story Points

Description

Monitoring production error logs is tedious. It would be awesome if we could get an email about critical PHP errors, rather than having to wait for someone to report a bug then comb the logs manually.

Sam already got a start on this at https://github.com/wikimedia/grantmetrics/pull/117 but ran into some issues.

I think this should be prioritized, especially as the code complexity and our user base are growing.

Event Timeline

This is a good idea. Let's flesh this out and get it ready for estimation next week.

Are there other options besides email?

I've used tools that collect error that are emitted via code (over UDP or HTTP) or parsed logs. Do we have anything like that available in this environment?

Ops folks use IRC bots to alert about critical errors. That's another option. I don't know about the rest of the team but I do watch our IRC feed pretty frequently.

Let's just do email for now. Getting these reports is critical. We can investigate other outlets at a later date.

If T205813 doesn't eventuate, then we can resort to having a cronjob email the log file at some interval (and rotating it in the process, so we don't get duplicate log entries in subsequent emails). Wouldn't be as cool as the nice HTML format error reports that Symfony spits out though (e.g. F26262849).

Samwilson set the point value for this task to 1.

https://github.com/wikimedia/grantmetrics/pull/117 is ready for review, and maintainers of grantmetrics-test should have a few minutes ago got an error email that proves it's working. Of course, it's hard to actually test in real life because we hope that we never raise critical errors.

This is done and merged. We got an error email from the staging site at 2018-10-03 15:24:52.

The prod site will be updated in due course. I guess QA for this will be when we get an error from production? So hopefully it'll be a while.

Things should be fine without config changes for the move to VPS as well — T206180: Create Event Metrics VPS and move Grant Metrics to it.

Niharika moved this task from QA to Q2 2018-19 on the Community-Tech-Sprint board.

Works on staging. If we discover that it doesn't work on prod, we can reopen this.