Occasional/recurring Cassandra OutOfMemory exceptions continue, the result of issues discussed in {T144431}. With updates now happening in codfw, the OOMs have been isolated there where their impact is not felt on client reads, but we should continue to document them. Rather than to continue to open a new phabricator issue each time, let's use this single issue to keep a running log of them.
== `OutOfMemory` exceptions ==
| Time | Instance | Heapdump | Comments |
|-------|-------|------|------|
| 2017-03-16T20:44:14 | restbase2001-c | ~~/srv/cassandra-c/java_pid6856.hprof~~ | Restarted by Puppet @ ~2017-03-16T21:08:14 |
| 2017-03-24T12:49:59 | restbase2001-a | ~~/srv/cassandra-a/java_pid3678.hprof~~ | Restarted by puppet, can't recover `org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Could not read commit log descriptor in file /srv/cassandra-a/commitlog/CommitLog-5-1489701224558.log`
| 2017-03-24T12:50:33 | restbase2009-b | ~~/srv/cassandra-b/java_pid2467.hprof~~ | Restarted by puppet
| 2017-03-27T07:47:26 | restbase2012-b | ~~/srv/cassandra-b/java_pid33443.hprof~~ | Restarted by Puppet @ 2017-03-27T08:13:36 |
| 2017-03-30T15:47:15 | restbase2004-a | ~~/srv/cassandra-a/java_pid28335.hprof~~ | Manually restarted, back up @ ~2017-03-30T15:51:15 |
| 2017-03-30T15:33:45 | restbase2010-c | ~~/srv/cassandra-c/java_pid52532.hprof /srv/cassandra-c/java_pid67967.hprof /srv/cassandra-c/java_pid75083.hprof~~ | Manually restarted (3 times); Back up @ ~2017-03-30T15:56:55 |
| 2017-04-01T01:41:35 | restbase2004-b | /srv/cassandra-b/java_pid814.hprof | Restarted @ ~2017-04-01T02:02:35 |
| 2017-04-02T01:42:25 | restbase2005-c | /srv/cassandra-c/java_pid10559.hprof | Restarted @ 2017-04-02T01:43:25 |
| 2017-04-02T03:28:25 | restbase2001-a| /srv/cassandra-a/java_pid5021.hprof /srv/cassandra-a/java_pid2347.hprof /srv/cassandra-a/java_pid26573.hprof /srv/cassandra-a/java_pid28144.hprof /srv/cassandra-a/java_pid17332.hprof | 5 events total; Resolved @ ~2017-04-02T05:37:35 |
| 2017-04-02T03:38:25 | restbase2009-a| /srv/cassandra-a/java_pid24320.hprof /srv/cassandra-a/java_pid12720.hprof /srv/cassandra-a/java_pid6210.hprof /srv/cassandra-a/java_pid2131.hprof | 4 events total; Resolved @ ~2017-04-02T05:34:25 |
| 2017-04-11T06:42:58 | restbase2004-a | /srv/cassandra-a/java_pid14987.hprof | Resolved @ ~2017-04-11T06:58:58 by @MoritzMuehlenhoff |