In T294946 DCops racked and configured ml-staging200[12] nodes. We should do the following:
- Reimage both nodes as Bullseye (with overlay partitions etc..)
- Create ml-serve-staging-etcd200[1-3] VMs and the related etcd cluster
- Create ml-serve-staging-ctrl200[1-2] VMs (control plane nodes)
- Allocate network resources.
- Bootstrap the ml-serve-staging k8s cluster
- Add the inference-staging.svc.codfw.wmnet endpoint (or a discovery one, if it makes sense, maybe yes for consistency).
The above plan is very high level, it will require surely more work. More details in https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/New