Page MenuHomePhabricator

Icinga monitoring for elasticsearch doesn't notice OOM conditions (this is happening on cloud)
Closed, ResolvedPublic

Description

The monitoring for Elasticsearch doesn't seem to notice when an OOM has happened on a node. I have very seldom seen a node recover on it's own from an OOM so it seems like something worth of alerting about.

Event Timeline

bd808 raised the priority of this task from to Needs Triage.
bd808 updated the task description. (Show Details)
bd808 added a project: Wikimedia-Logstash.
bd808 changed Security from none to None.
bd808 added subscribers: bd808, Gage, Manybubbles.

I think I remember there being a java jre command line flag that lets you install an OOM signal handler. I have vague memories of using that to send alerts in a long forgotten past as a Java Shop Administrator™.

bd808 triaged this task as Medium priority.Feb 2 2015, 5:07 PM

Change 487787 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: exit the JVM on OutOfMemoryError

https://gerrit.wikimedia.org/r/487787

Instead of monitoring this specific error, let's just configure the JVM to restart on memory errors.

Nuria renamed this task from Icinga monitoring for elasticsearch doesn't notice OOM conditions to Icinga monitoring for elasticsearch doesn't notice OOM conditions (this is happening on cloud) .Mar 5 2019, 6:28 PM

Change 487787 merged by Gehel:
[operations/puppet@production] elasticsearch: exit the JVM on OutOfMemoryError

https://gerrit.wikimedia.org/r/487787

Merged, will take effect with the next cluster restarts.