Page MenuHomePhabricator

Investigate if Node Managers can be restarted without impacting running containers
Closed, ResolvedPublic5 Estimated Story Points

Description

Starting point: https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html

Essentially we want to be able to restart the Node Manager daemons on the Hadoop worker nodes without impacting the running containers.

Event Timeline

elukey triaged this task as High priority.Feb 1 2017, 5:14 PM
elukey added a project: Analytics-Clusters.

Change 336203 had a related patch set uploaded (by Elukey):
Enable Yarn's Node Manager recovery to allow graceful restarts

https://gerrit.wikimedia.org/r/336203

Mentioned in SAL (#wikimedia-operations) [2017-02-06T13:30:05Z] <elukey> applied https://gerrit.wikimedia.org/r/#/c/336203/ manually to analytics1028 (hadoop worker node) as live test - T156932

Milimetric set the point value for this task to 5.Feb 6 2017, 4:42 PM

Change 336203 merged by Elukey:
Enable Yarn's Node Manager recovery to allow graceful restarts

https://gerrit.wikimedia.org/r/336203

Change 336380 had a related patch set uploaded (by Elukey):
Update the cdh's module sha

https://gerrit.wikimedia.org/r/336380

Change 336380 merged by Elukey:
Update the cdh's module sha

https://gerrit.wikimedia.org/r/336380

Mentioned in SAL (#wikimedia-operations) [2017-02-07T14:19:50Z] <elukey> restarting all the Yarn Node Managers on the Hadoop worker nodes to pick up the new config - T156932