We will be upgrading the kubernetes cluster in codfw to kubernetes 1.16, calico 3.17 like we did for staging-eqiad in T276305.
This includes:
* Setting up new master VMs `kubemaster200[12].codfw.wmnet`, VMs set up in T276204
* Rebooting `kubetcd[2004-2006].codfw.wmnet` for T273278
* Reimaging worker nodes `kubernetes[2001-2017].codfw.wmnet`
** With Kernel 4.19 T262527 (which also fixes them in T273279)
The plan is roughly:
* Prepare all needed patches
** Enabling the kernel 4.19 profile for nodes
** Double check `deployment-charts/helmfile.d/admin_ng` has correct values populated and the cluster enabled
** Don't forget private puppet:
* controllermanager_token
* are certs already done?
* Downtime all services in the cluster
* Cut traffic to all services in the cluster (`sre.discovery.service-route` cookbook?)
* Disable puppet on master and nodes
* Stop apiserver, controller manager, scheduler on master
* Empty etcd (`ETCDCTL_API=3 etcdctl --endpoints https://foobar.site.wmnet:2379 del "" --from-key=true`)
* Reboot etcd servers (checkmars in T273278)
* Image the new master
* Start reimaging nodes (checkmarks in T273279
* Start apiserver, controller manager, scheduler
* `helmfile sync` admin_ng
* Deploy all services
* Check all services (service-checker if possible)
* End downtime of services
* Decommission the old masters