Page MenuHomePhabricator

Alert on abnormal storage growth patterns
Open, LowPublic

Description

Recently, we encountered a bug that caused a title to be re-rendered on each nrpe health check. As the row associated with this title got wider and wider, read latency increased, as did memory allocation for the effected queries, eventually culminating in Cassandra OOM exceptions. There have been similar bugs in the past as well. We should invest effort into proactively alerting on such changes to storage.

Metrics of interest:

  • Row size (tricky if we allow rows to grow unbounded; a static threshold is probably not sufficient)
  • Column count (same as above, a static threshold will probably not work)
  • Tombstones (can be grokked from logstash)
  • Others?

References:

Event Timeline

Eevans raised the priority of this task from to Medium.
Eevans updated the task description. (Show Details)
Eevans added a project: RESTBase.
Eevans subscribed.
Eevans renamed this task from alert on abnormal storage growth patterns to Alert on abnormal storage growth patterns.Apr 29 2016, 8:39 PM
Eevans added a project: Cassandra.
GWicke edited projects, added Services (later); removed Services.
GWicke moved this task from later to designing on the Services board.
GWicke edited projects, added Services (designing); removed Services (later).
Eevans lowered the priority of this task from Medium to Low.Sep 19 2023, 8:04 PM