Page MenuHomePhabricator

Comprehensive monitoring / alerting for labstore* instances
Closed, ResolvedPublic

Description

Should have alerts for any issues with labstore*:

  • High load
  • Network saturation
  • IO Saturation
  • NFS deamon running properly

Details

Related Gerrit Patches:

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Cloud-Services.
yuvipanda added subscribers: yuvipanda, coren, Andrew.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 31 2015, 7:24 PM

Change 201591 had a related patch set uploaded (by Yuvipanda):
labs: Add monitoring for high iowait on labstore instances

https://gerrit.wikimedia.org/r/201591

We already have checks for network saturation as well.

Change 201591 merged by Yuvipanda:
labs: Add monitoring for high iowait on labstore instances

https://gerrit.wikimedia.org/r/201591

Change 201618 had a related patch set uploaded (by Yuvipanda):
labs: Alert on high load in labstore*

https://gerrit.wikimedia.org/r/201618

There's already network saturation alerts.

We should probably make these paging as well, though.

Also, I'm wondering if there should be *all* graphite alerts, or we should have active alerts as well. hmm.

yuvipanda moved this task from Backlog to Doing on the Labs-Q4-Sprint-1 board.Apr 2 2015, 11:50 PM

Change 201618 merged by Yuvipanda:
labs: Alert on high load in labstore*

https://gerrit.wikimedia.org/r/201618

yuvipanda closed this task as Resolved.Apr 4 2015, 11:16 PM
yuvipanda claimed this task.

Alright, so I'm going to consider this 'done' for now. More checks as warrented.

yuvipanda moved this task from Doing to Done on the Labs-Q4-Sprint-1 board.Apr 6 2015, 11:21 PM