Page MenuHomePhabricator

Configure Yarn to be able to locate nodes with a GPU
Closed, ResolvedPublic

Description

In T255138 we added 6 nodes equipped with a GPU to the Hadoop cluster. As follow up, we should configure yarn to be able to locate and use those nodes when needed.

Yarn Labels may be a solution, but it should be reviewed/tested.

Event Timeline

I found https://www.ibm.com/support/pages/node/6260093 that says:

Recommended versions
The YARN node labels feature was introduced in Apache Hadoop 2.6, but it’s not mature in the first official release. The recommended versions are 2.8 and later, which include a lot of fixes and improvements. For IOP, the supported version begins with IOP 4.2.5, which is based on Apache Hadoop 2.7.3. It has all the important fixes and improvements for node labels and has been thoroughly tested by us.

razzi changed the task status from Open to Stalled.Oct 15 2020, 4:07 PM
razzi added a subscriber: razzi.

@elukey says this depends on BigTop for the hadoop version.

elukey changed the task status from Stalled to Open.Mar 4 2021, 9:43 AM

Aaand we finally have hadoop 2.10.1, so labels are well supported. Let's try to deploy them :)

Sadly it seems that due to https://issues.apache.org/jira/browse/YARN-6636 (and other related issues), the Fair scheduler (that we use) doesn't support/respect node labels. If we want to have GPU labels for example, we'll have to think about the capacity scheduler..

elukey changed the task status from Open to Stalled.Mar 10 2021, 4:28 PM

This is stalled until we get to the Capacity scheduler :)

Change 675088 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/puppet@production] hadoop: introduce the GPU labels for the test cluster

https://gerrit.wikimedia.org/r/675088

Change 675088 merged by Elukey:
[operations/puppet@production] hadoop: introduce the GPU labels for the test cluster

https://gerrit.wikimedia.org/r/675088

Change 675111 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/puppet@production] hadoop: refactor how Yarn node labels are set in hiera

https://gerrit.wikimedia.org/r/675111

Change 675111 merged by Elukey:
[operations/puppet@production] hadoop: refactor how Yarn node labels are set in hiera

https://gerrit.wikimedia.org/r/675111