Alarm on errors on /var/log/upstart/eventlogging* files
Open, HighPublic

Description

Alarm on errors on /var/log/upstart/eventlogging* files

We have alarms for throughput and ingestion but not for exceptions that might make the consumer restart.

Errors on mysql consumer specially indicate of a problem, we should start there.

Nuria created this task.Jul 13 2017, 7:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 13 2017, 7:03 PM
Nuria triaged this task as High priority.Jul 17 2017, 3:56 PM
Nuria moved this task from Incoming to Dashiki on the Analytics board.

Or rather alarm in process flapping

elukey added a subscriber: elukey.Aug 4 2017, 10:32 AM

Can we use logster for this task? (reading logs in cron and reporting metrics like we do for Varnishkafka)

Nuria added a comment.Aug 14 2017, 3:58 PM

Instrument code to send errors to graphite? That would work for errors but not process flapping.

Agree, we shouldn't do weird logster stuff for this, but instead instrument eventlogging to emit errors somewhere nicely. Parsing the logs sounds a little hacky.

For process flapping: we might want to just wait until (if?) we upgrade eventlogging to run on stretch and use systemd.

fdans moved this task from Dashiki to Backlog (Later) on the Analytics board.Oct 2 2017, 4:05 PM