HomePhabricator

mixin: Fix alert about unhealthy sidecar (#2929)

Authored by Markos Chandras <hwoarang@users.noreply.github.com> on Aug 12 2020, 3:20 PM.

Description

mixin: Fix alert about unhealthy sidecar (#2929)

The alert was giving the wrong information as the $value contained
the number of pods that failing to send heartbeat instead of the actual
number of seconds that each sidecar was being unhealthy.

Also the 5 minute interval is probably too low as on large deployments
prometheus could take much longer to come up online and for sidecar to
become actually useful.

As such, we can simply subtract the timestamp of the last heartbeat from
the current time and fire if we are lagging for more than 10 minutes.

Signed-off-by: Markos Chandras <markos@chandras.me>

Details

Committed
GitHub <noreply@github.com>Aug 12 2020, 3:20 PM
Parents
rODTH7f0364db2333: Initialize forgotten label (#3025)
Branches
Unknown
Tags
Unknown
ChangeId
None

Event Timeline

GitHub <noreply@github.com> committed rODTHd6305f5920c4: mixin: Fix alert about unhealthy sidecar (#2929) (authored by Markos Chandras <hwoarang@users.noreply.github.com>).Aug 12 2020, 3:20 PM