Page MenuHomePhabricator

Analyze Cassandra heapdumps from 2017-04-19 outage of 2005-c and 2010-b
Closed, ResolvedPublic

Description

Two OOM exceptions occurred in codfw after the data-center switch over, that is, after change-propagation updates were moved to eqiad. While this may be related to known issues, it is unusual enough that it warrants a closer look.

See also:

Event Timeline

Eevans updated the task description. (Show Details)

Unfortunately, both of these heap dumps seem to have corrupted by Cassandra's OOM error handling; I don't think there is anything useful to be learned from these.