Cassandra has exited abruptly on restbase2004.codfw.wmnet due to OOM exceptions several times in the last 24 hours (6 times, at least):
- 2016-03-17T16:29:00
- 2016-03-17T10:52:00
- 2016-03-17T17:35:00
- 2016-03-17T15:27:00
- 2016-03-17T13:33:00
- 2016-03-17T20:14:00
From a One of These Things Is Not Like the Others perspective, the thing which stands out is this exception:
I believe this is an indication of a corrupt SSTable.
Since this occurs during compaction of the affected table (which I haven't been able to yet identify), it's cyclical, which might explain why this node's utilization looks reasonable right before it enters a death spiral and exits with OOM. But this is just a working theory; The investigation continues....