HDFS Datanodes and Yarn Nodemanager daemons are running on each hadoop worker and consuming constantly almost all the allocated memory available, often ending up in old gen garbage collection. The 2G Xmx/Xms limit is probably too old/inadequate for the actual workload of the cluster, so some tuning is needed.
Description
Details
Event Timeline
Change 386147 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: raise Xmx/Xms settings for hadoop worker daemons on an1030
Change 386147 merged by Elukey:
[operations/puppet@production] hadoop: raise Xmx/Xms settings for hadoop worker daemons on an1030
Mentioned in SAL (#wikimedia-operations) [2017-10-25T13:30:00Z] <elukey> restart yarn nodemanager and hdfs datanode on analytics1030 to apply new JVM settings - T178876
Change 390237 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: raise jvm heap sizes for HDFS datanode and Yarn daemons
This is going to be done but it needs to wait for the next round of restarts of the JVM.
Change 394256 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::worker: increase Java Xmx to 4G for Datanode/Nodemanager
Change 394256 merged by Elukey:
[operations/puppet@production] profile::hadoop::worker: increase Java Xmx to 4G for Datanode/Nodemanager
Change 390237 abandoned by Elukey:
hadoop: raise jvm heap sizes for HDFS datanode and Yarn daemons