Some workers are a bit over-burdened in the cluster at this point: https://grafana-labs.wikimedia.org/d/000000004/tools-activity?panelId=2&fullscreen&orgId=1
We should avoid > 25 pods on a node. A few more nodes at the larger planned size (introducing a server group with soft anti-affinity policy to slowly replace all nodes into) should be just the thing.
Then drain the overburdened nodes to redistribute things.