In T260445 DC-ops racked the last 6 worker nodes that were pending, part of the original batch of 24 that we scheduled for the cluster expansion. We used 18 of them for the temporary backup cluster, and all those 18 are already in the HDFS Namenode config of the Analytics cluster (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/664302/3/hieradata/common.yaml).
We need to add the following nodes as well (from puppet's site.pp:
#staged an-workers via T260445 node /^an-worker11(29|33|34|39|40|41)\.eqiad\.wmnet$/ { role(insetup) }
Things to do:
- Come up with a change like https://gerrit.wikimedia.org/r/c/operations/puppet/+/664302/3/hieradata/common.yaml for the above nodes (netbox.wikimedia.org will be useful to check rack locations)
- Merge and run puppet on an-master100[1,2]
- Roll restart the Namenodes (usual failover, restart 1001, failback, restart 1002 procedure - or using the cookbook)
- Check if the above nodes need to get kerberos keytabs in puppet private
- Check https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration and see if documentation is missing/outdatated
The end result will be to be able to move the 6 new nodes to the Analytics Hadoop cluster anytime :)