We are currently at 1.12.9. This is no longer security supported as a release.
As of Kubernetes 1.19, bugfix support via patch releases for a Kubernetes minor release has increased from 9 months to 1 year.
But:
- We are not able to go 1.19 because of calico 3.16 (current version) only supporting 1.18
- We are not able to go > 1.16 because of helm2 only supporting 1.16
- We can't upgrade to helm3 because that requires k8s >= 1.13
- We can't stay on/upgrade to < 1.16 because calico needs at least 1.16
So best bet is currently to update to k8s 1.16 which gives us ingress and CRD support. From that we need to migrate to helm3 and afterwards we are able to continue to k8s > 1.16 (whatever makes sense than).
For the actual upgrade of clusters we have:
K8s upgrades
This is the reinitialize k8s cluster (e.g. don't really update stuff) plan.
- Add 1.19 to CI kubeyaml (T266032)
- Build k8s 1.16 (T266766)
- Read a lot of changelog
- Set up the kubernetes codfw staging cluster with stretch (to at least keep the current docker version) + kernel 4.19 + k8s 1.16
- Prove 1.16 is ok and all (use a more sophisticated wording here :P)
- Do we test the /admin part of deployment-charts in CI? (we don't T266670)
- Watch out changed things
- renamed metrics (probably)
- Kubernetes daemons (probably changed logging) log to logstash
- Switch staging reference to point to codfw
- Reinitialize codfw with 1.16
- Reinitialize eqiad with 1.16
- Migrate to helm3 (T251305)
Calico upgrades T207804
- Prep work for moving the egress policy to charts has been done by the contractor
- Probably double check that all rules are set up in the charts
- Quite possibly go the full cluster reinit way and go the latest version
- Decide decide if we are going to be staying with direct access to etcd (version 3?) or try and switch to the kubernetes APIs (T266895)
- Build the calico debs, and cni debs, calico-node docker image
- Test in a staging cluster (probably during reinit as well?).