Page MenuHomePhabricator

Hadoop namenodes are again and again dying with error "Java heap space"
Closed, ResolvedPublic

Description

On 2015-02-04, twice a Hadoop name node died with "Java heap space".
On 2015-02-07, both Hadoop name nodes died with "Java heap space" errors around the same time, so fail-over was not possible.
Upon manually restarting the name nodes, they died again with "Java heap space" before they could properly start.
Hence, HDFS was down and could not be brought up.

Manually bumping the heap size for the name nodes from the default of 1GB to 2GB worked, and allowed to restart the name nodes.

Let's puppetize a 2GB heap for the name nodes.

Event Timeline

QChris claimed this task.
QChris raised the priority of this task from to Needs Triage.
QChris updated the task description. (Show Details)
QChris added a project: Analytics-Clusters.
QChris subscribed.
gerritbot subscribed.

Change 189143 had a related patch set uploaded (by QChris):
Force 2GB of heap for name nodes

https://gerrit.wikimedia.org/r/189143

Patch-For-Review

Change 189143 merged by Andrew Bogott:
Force 2GB of heap for name nodes

https://gerrit.wikimedia.org/r/189143

Change 189146 had a related patch set uploaded (by QChris):
Bump cdh module to increase heap on name nodes

https://gerrit.wikimedia.org/r/189146

Patch-For-Review

Change 189146 merged by Andrew Bogott:
Bump cdh module to increase heap on name nodes

https://gerrit.wikimedia.org/r/189146

HDFS is up and working again.
Thanks Andrew Boggott!