Since we have instances randomly freezing, and it could happen to the kubernetes master too, let's make sure it's got a HA setup going.
- Mentioned In
- rOPUPead26379361d: tools: Allow multiple k8s master to access etcd
rOPUP088f1f7ca7d3: k8s: Make controller-manager & scheduler be HA
rOPUP862efcebf8c0: tools: Allow multiple k8s master to access etcd
rOPUP5c2bcd527896: k8s: Make controller-manager & scheduler be HA
- Mentioned Here
- T215531: Deploy upgraded Kubernetes to toolsbeta
This ran into a bump - we have kube-maintainusers, which is used to populate token auth of all the masters. This should run in only one place, however, and push updates to all the places.
To do this, I am going to do the following:
- Move maintain-kubeusers to a centralized location (puppetmaster maybe?)
- Setup some way for it to push config to all the masters and restart them only when it's sure it has propogated everywhere.
We know how to do this now.
In T215531: Deploy upgraded Kubernetes to toolsbeta we are developing a new k8s cluster which is deployed by using kubeadm. This new mechanism takes care of building the multi master setup for us.
The next version of the toolforge k8s service should contain a fix for this.
Closing task now. Feel free to reopen if required.