For every 5% of external traffic we move, we've needed to bump mw-web by 12-13 replicas and mw-api-ext by 10 replicas.
This means that for every 5% increase in traffic, we're requiring 22-23 additional replicas. Given every pod requires 5.6 CPUs it means we're going to need about 123 cores per traffic bump, or roughly 3 servers as our servers have 48 cores each.
The above calculation is per-datacenter, of course.
My proposal is to start converting servers, first bringing the appservers cluster down to the same size as the api one, then chipping 2 servers per api group from there on.
I say to try to reach parity first because we will chip into the api cluster first to move mobileapps over to k8s.
Current state of the clusters https://docs.google.com/spreadsheets/d/1VqgWZxmP6LqUgFChIvV5BYvHqr1ZhUh17iXgJ26_1UM/edit#gid=1295795675
This script can be used to automate patch creation a bit: https://gitlab.wikimedia.org/repos/sre/serviceops-kitchensink/-/blob/main/add_k8s_node/add_k8s_node.py?ref_type=heads