Since we deployed successfully Bigtop and some days passed without major regressions, I think that we can think about wiping the Backup clusters and add the nodes to the main one.
High level steps:
- Add the puppet configuration for new worker nodes to the main cluster's hdfs rack awareness config, and roll restart the namenodes (so we'll be able to add new nodes without any risk of hitting the default rack)
- Stop the backup cluster daemons, remove all puppet config and set role(insetup) to all new workers.
- Add a couple of nodes to the cluster and check that the Buster packages work fine etc..
- Reimage all the workers with Buster and use the init worker cookbook to wipe them clean (we don't want any data from the previous datanode dirs)
- Add the rest of the nodes in a couple of big batches (10/11 nodes each). This should alleviate the work of the hdfs balancer when spreading blocks over the new nodes.