Page MenuHomePhabricator

Add an-worker11[42-48] to the Hadoop cluster
Closed, ResolvedPublic3 Estimated Story Points

Description

These 7 servers are being added to the cluster as replacements for analytics10[58-69]

They have been racked and installed in task T293922

Once these new server have been installed, analytics10[58-69] should be decommissioned.

Event Timeline

I have run the sre.init-hadoop-workers cookbook on all nodes.
There was a slight issue with an-worker1146 because one of the RAID 0 volumes appeared as a foreign configuration. but I rectified this manually.

I also created the journalnode volume on all seven new nodes (even thought it may not be used on them).

The location of the new hadoop nodes has been added here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/831532
This will need ot be merged and applied to the namenodes with a restart.

Once that is done, I will be able to change the role of the new nodes to add them to the cluster.

I've also added the new keytabs for the nodes and added them to the private puppet repo.

Change 831841 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Put the new hadoop nodes into service

https://gerrit.wikimedia.org/r/831841

Change 831841 merged by Btullis:

[operations/puppet@production] Put the new hadoop nodes into service

https://gerrit.wikimedia.org/r/831841

These are all in service now and the autoamtic daily rebalance job is running.

image.png (239×1 px, 14 KB)

We can now proceed to decommission analytics10[58-69]

BTullis claimed this task.