Page MenuHomePhabricator

Review Burrow alarms in order to avoid false positives when restarting it
Closed, ResolvedPublic1 Estimated Story Points

Description

When we restart Burrow instances there is a brief amount of time in which the metrics go awol causing alarms to fire, especially the Mirror Maker ones. Ideally Burrow should be able to restart without creating any impact to metrics.

Event Timeline

fdans triaged this task as Medium priority.Jun 4 2018, 3:53 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 438243 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kafka::mirror::alerts: tune retries and its interval

https://gerrit.wikimedia.org/r/438243

Change 438243 merged by Elukey:
[operations/puppet@production] profile::kafka::mirror::alerts: tune retries and its interval

https://gerrit.wikimedia.org/r/438243

Tested one burrow restart manually, alarms showed up in icinga but cleared up before firing an alarm. This is a good compromise for the moment, we'll revise the alarm when a more powerful and flexible prometheus check metrics script will be available.

Vvjjkkii renamed this task from Review Burrow alarms in order to avoid false positives when restarting it to hubaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii removed elukey as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot lowered the priority of this task from High to Medium.Jul 3 2018, 3:25 AM
Nuria set the point value for this task to 1.