Page MenuHomePhabricator

Configure a threshold for earlier notification of /srv/cassandra/instance-data
Open, HighPublic

Description

The RESTBase cluster uses an array mounted as /srv/cassandra/instance-data to store hints, caches, and commitlogs. Under aberrant conditions these arrays can fill up and take down all of the hosts' configured instances. The standard utilization threshold is 90%; We should consider a lower threshold to provide more advance notice of impending issues.

Details

Event Timeline

Eevans created this task.Apr 6 2018, 7:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 6 2018, 7:36 PM
Eevans triaged this task as Low priority.Apr 6 2018, 7:36 PM
mobrovac raised the priority of this task from Low to High.Apr 6 2018, 8:56 PM
mobrovac added a subscriber: mobrovac.

Re-prioritising due to the recent occurrence of this.

Eevans moved this task from Backlog to In-Progress on the User-Eevans board.Jun 29 2018, 4:27 PM

Change 443114 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/puppet@production] restbase: cleanup remainging detritus from storage transition

https://gerrit.wikimedia.org/r/443114

Change 443137 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/puppet@production] WIP: restbase: use lower threshold when monitoring instance-data partition

https://gerrit.wikimedia.org/r/443137

Eevans moved this task from In-Progress to Blocked on the User-Eevans board.Jun 29 2018, 7:25 PM

Change 443114 merged by Filippo Giunchedi:
[operations/puppet@production] restbase: cleanup remaining detritus from storage transition

https://gerrit.wikimedia.org/r/443114

Does this still need to happen or did the cleanup save us from further incidents?

How is this looking, anyone?