Since we have instances randomly freezing, and it could happen to the kubernetes master too, let's make sure it's got a HA setup going.
Need to follow http://kubernetes.io/docs/admin/high-availability/#replicated-api-servers
Since we have instances randomly freezing, and it could happen to the kubernetes master too, let's make sure it's got a HA setup going.
Need to follow http://kubernetes.io/docs/admin/high-availability/#replicated-api-servers
operations/puppet : production | tools: Allow multiple k8s master to access etcd |
operations/puppet : production | k8s: Make controller-manager & scheduler be HA |
Status | Assigned | Task | ||
---|---|---|---|---|
Restricted Task | ||||
Open | • bd808 | T232536 Toolforge Kubernetes internal API down, causing `webservice` and other tooling to fail | ||
Open | None | T236565 "tools" Cloud VPS project jessie deprecation | ||
Open | None | T214513 Upgrade Toolforge Kubernetes | ||
Resolved | aborrero | T142862 Setup Kubernetes Masters in a HA setup | ||
Resolved | aborrero | T215663 Stand up upgraded Toolforge etcd clusters | ||
Resolved | aborrero | T215530 Sort out the best method of spinning up multiple toolforge kubernetes masters | ||
Resolved | aborrero | T215679 Sort out and test deploying the worker nodes in a sane fashion | ||
Resolved | Bstorm | T215529 Puppetize/stand up a load balancer for K8s API servers | ||
Resolved | aborrero | T215975 Package/copy kubeadm, kubelet, docker-ce and kubectl to Toolforge Aptly or Reprepro |
Change 304503 had a related patch set uploaded (by Yuvipanda):
k8s: Make controller-manager & scheduler be HA
Change 304504 had a related patch set uploaded (by Yuvipanda):
tools: Allow multiple k8s master to access etcd
This ran into a bump - we have kube-maintainusers, which is used to populate token auth of all the masters. This should run in only one place, however, and push updates to all the places.
To do this, I am going to do the following:
We know how to do this now.
In T215531: Deploy upgraded Kubernetes to toolsbeta we are developing a new k8s cluster which is deployed by using kubeadm. This new mechanism takes care of building the multi master setup for us.
The next version of the toolforge k8s service should contain a fix for this.
Closing task now. Feel free to reopen if required.