In T207192 an-worker1078->95 have been configured as hadoop worker nodes (datanode partitions configured). We need to remove analytics1028->1041 from service and add the newer ones.
Notes:
- Decomming a node is explained in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Decommissioning
- Adding new workers should be a matter of updating the rack awareness config for HDFS and assign to them the right puppet role
- analytics1028 and analytics1035 are journal nodes, so the journal config needs to move to other two servers before decomming them completely.
A safe procedure to move journal nodes could be the following:
- Set HDFS in safe mode (no writes accepted)
- stop/mask the two journal node daemons (on analytics1028 and analytics1035)
- merge a config change for puppet to move the journal config to another two nodes, copy the journal node partition to them and roll restart all the journal nodes
- Remove HDFS Safe mode
Hosts decommed from HDFS/Yarn:
- analytics1028
- analytics1029
- analytics1030
- analytics1031
- analytics1032
- analytics1033
- analytics1034
- analytics1035
- analytics1036
- analytics1037
- analytics1038
- analytics1039
- analytics1040
- analytics1041