Investigate if Node Managers can be restarted without impacting running containers
Closed, ResolvedPublic5 Story Points

Description

Starting point: https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html

Essentially we want to be able to restart the Node Manager daemons on the Hadoop worker nodes without impacting the running containers.

elukey created this task.Wed, Feb 1, 5:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptWed, Feb 1, 5:13 PM
elukey triaged this task as "High" priority.Wed, Feb 1, 5:14 PM
elukey added a project: Analytics-Cluster.

Change 336203 had a related patch set uploaded (by Elukey):
Enable Yarn's Node Manager recovery to allow graceful restarts

https://gerrit.wikimedia.org/r/336203

Mentioned in SAL (#wikimedia-operations) [2017-02-06T13:30:05Z] <elukey> applied https://gerrit.wikimedia.org/r/#/c/336203/ manually to analytics1028 (hadoop worker node) as live test - T156932

Milimetric set the point value for this task to 5.Mon, Feb 6, 4:42 PM

Change 336203 merged by Elukey:
Enable Yarn's Node Manager recovery to allow graceful restarts

https://gerrit.wikimedia.org/r/336203

elukey moved this task from Backlog to In Progress on the User-Elukey board.Tue, Feb 7, 8:39 AM

Change 336380 had a related patch set uploaded (by Elukey):
Update the cdh's module sha

https://gerrit.wikimedia.org/r/336380

Change 336380 merged by Elukey:
Update the cdh's module sha

https://gerrit.wikimedia.org/r/336380

Mentioned in SAL (#wikimedia-operations) [2017-02-07T14:19:50Z] <elukey> restarting all the Yarn Node Managers on the Hadoop worker nodes to pick up the new config - T156932

elukey moved this task from In Code Review to Done on the Analytics-Kanban board.Tue, Feb 7, 2:34 PM
Nuria closed this task as "Resolved".Tue, Feb 14, 4:05 PM