Hi folks,
We currently have only two nodes with a GPU on Hadoop, but we still have the corresponding Yarn label to multiple nodes:
elukey@an-master1003:~$ for host in 1096 1097 1098 1099 1100 1101; do echo "an-worker${host}"; sudo -u yarn kerberos-run-command yarn yarn node -status an-worker$host.eqiad.wmnet:8041 2>&1| grep Labels; done an-worker1096 Node-Labels : GPU an-worker1097 Node-Labels : GPU an-worker1098 Node-Labels : GPU an-worker1099 Node-Labels : GPU an-worker1100 Node-Labels : GPU an-worker1101 Node-Labels : GPU
I would do this to fix:
sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1096.eqiad.wmnet=" sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1097.eqiad.wmnet=" sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1098.eqiad.wmnet=" sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1099.eqiad.wmnet="
The ML team is currently testing training models on Hadoop with GPUs :)