At ~22:30 UTC on 2017-02-16, two Cassandra instances (restbase2009-c and restbase2010-a) experienced OOM of exceptions
$ for i in 2009 2010; do echo "$i: "; ssh restbase$i.codfw.wmnet -- "sudo find /srv/cassandra-* -maxdepth 1 -name '*.hprof' -exec ls -lh {} \;"; done 2009: -rw------- 1 cassandra cassandra 4.7G Feb 16 22:35 /srv/cassandra-c/java_pid7023.hprof 2010: -rw------- 1 cassandra cassandra 5.0G Feb 16 22:35 /srv/cassandra-a/java_pid120525.hprof
NOTE: It's reasonable to assume that this is a continuation of T144431: RESTBase k-r-v as Cassandra anti-pattern; This ticket was only opened to document the event, a detailed analysis is probably not worth the time