Page MenuHomePhabricator

Monitor bigbrother
Closed, DeclinedPublic

Description

I only noticed the syntax error in bigbrother when I logged into tools-submit and looked at /var/log/syslog:

Feb 26 06:17:57 tools-submit kernel: [128497.321239] init: bigbrother main process (29839) terminated with status 255
Feb 26 06:17:57 tools-submit kernel: [128497.321265] init: bigbrother respawning too fast, stopped

So it would be nice if it is monitored that the service is actually up and running.

Event Timeline

scfc raised the priority of this task from to Needs Triage.
scfc updated the task description. (Show Details)
scfc added projects: Toolforge, Cloud-Services.
scfc added subscribers: yuvipanda, Aklapper, Pietrodn and 9 others.

Hmm, I wonder how exactly we would do this, since we don't really have active checks atm.

scfc triaged this task as Lowest priority.Apr 6 2015, 11:13 AM
scfc moved this task from Triage to Backlog on the Toolforge board.

The timestamp on the scoreboard file should change on each pass through bigbrother.py's run loop. Can we setup an icinga alert if that timestamp is more than N minutes old?