Page MenuHomePhabricator

improve redis master/slave monitoring
Closed, ResolvedPublic

Description

as discovered in https://wikitech.wikimedia.org/w/index.php?title=Incident_documentation/20150605-redis redis hasn't alerts for master/slave falling out of sync for extended periods of time, icinga alerts should fire if this is the case, possibly other anomalous redis conditions should be monitored as well

Event Timeline

fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a project: acl*sre-team.
fgiunchedi added a subscriber: fgiunchedi.
chasemp triaged this task as Medium priority.Jun 6 2015, 4:26 AM
chasemp added a subscriber: chasemp.
mark raised the priority of this task from Medium to High.Jul 2 2015, 7:02 PM
mark added a subscriber: mark.

redis replication checks were added in https://gerrit.wikimedia.org/r/#/c/282383/ by @Joe, any other redis-related checks we should be adding? otherwise this can be resolved

fgiunchedi claimed this task.

Resolving, we can reopen if the current checks need improvement