Update GPU labels in Hadoop 's Yarn config
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	elukey
	Mar 28 2024, 1:40 PM

Description

Hi folks,

We currently have only two nodes with a GPU on Hadoop, but we still have the corresponding Yarn label to multiple nodes:

elukey@an-master1003:~$ for host in 1096 1097 1098 1099 1100 1101; do echo "an-worker${host}"; sudo -u yarn kerberos-run-command yarn yarn node -status an-worker$host.eqiad.wmnet:8041 2>&1| grep Labels; done
an-worker1096
	Node-Labels : GPU
an-worker1097
	Node-Labels : GPU
an-worker1098
	Node-Labels : GPU
an-worker1099
	Node-Labels : GPU
an-worker1100
	Node-Labels : GPU
an-worker1101
	Node-Labels : GPU

I would do this to fix:

sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1096.eqiad.wmnet="
sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1097.eqiad.wmnet="
sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1098.eqiad.wmnet="
sudo -u yarn kerberos-run-command yarn yarn rmadmin -replaceLabelsOnNode "an-worker1099.eqiad.wmnet="

The ML team is currently testing training models on Hadoop with GPUs :)

Event Timeline

elukey created this task.Mar 28 2024, 1:40 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 28 2024, 1:40 PM

That's excellent, please feel free to proceed @elukey. I had forgotten to remove them.

Commands executed, new status:

an-worker1096
	Node-Labels : 
an-worker1097
	Node-Labels : 
an-worker1098
	Node-Labels : 
an-worker1099
	Node-Labels : 
an-worker1100
	Node-Labels : GPU
an-worker1101
	Node-Labels : GPU

Mentioned in SAL (#wikimedia-analytics) [2024-03-28T15:00:47Z] <elukey> remove GPU labels in Hadoop Yarn for an-worker[1096-1099] (the hosts don't have a GPU anymore) - T361225

BTullis awarded a token.Mar 28 2024, 3:02 PM

Update GPU labels in Hadoop 's Yarn configClosed, ResolvedPublicActions

Description

Event Timeline

Update GPU labels in Hadoop 's Yarn config
Closed, ResolvedPublic
Actions