We have two types of hardware in the elasticsearch cluster, but elasticsearch treats them exactly the same. This means that the older hardware type is significantly more loaded than the newer hardware.
In the current configuration the master node for the cluster is always on the older hardware and I suspect this plays a part in our instability issues.
ES is currently configured with three master capable nodes: elastic1001, elastic1008 and elastic1013. This is one server in rack A3 and two servers in rack C5. All of the new hardware is currently in row D. We think the best way forward will be to swap two machines in row D with a machine in A5 and a machine in C5, such that we have one server with new hardware in each row that is master capable.
Then we can reconfigure elasticsearch to make those three nodes with new hardware in independent racks each master capable.