Since redundancy isn't going to make the cluster run better, and it is (if anything) running worse at the moment than it was, can we please try removing two nodes from the cluster (especially making sure that tools-k8s-etcd-8 is among them because that one is still using the deprecated storage driver).
Fewer nodes will speed up the cluster at the expense of redundancy, but we should survive on 3 nodes. Currently it rides at an iowait fo between 7 and (no kidding) 43%. fsync is usually acceptable, but it occasionally rises badly. I figured this should best be done with script to prevent error and alerts. When this is done, I plan on revisiting the etcd tuning variables to see what else can be done.
If I can make the cluster stop sucking, maybe I'll set up backups for it to make up for whatever redundancy nonsense. I've been too afraid to until now for fear of collapsing it.