Our current cluster version is already a year and a half old two years old a the time of writing. We should schedule at some point an upgrade to a more recent version. Some guidelines here.
We probably want to create a test cluster somewhere to test out the upgrade and create instructions/a runbook to capture the necessary steps for the future. This process is a good candidate to be automated in our tofu repository; at the very least we can store any tools or scripts in there, even if full automation can't happen.
T408379 is a prerequisite for the version upgrade.
We're upgrading 7 minor versions here from v1.28 to v1.35, we will need to do the upgrade in stages. So far I've only found one pitfall that will very likely need manual intervention on our part:
- v1.32 started using Traefik v3: We'll probably need to manually migrate from Traefik v2 to v3 before we can upgrade K3s to v1.32
I also found another couple of breaking changes, but fortunately they shouldn't affect us:
- Required manual etcd version upgrade for K3 v1.34: Our cluster is not running in HA mode and uses SQLite instead of etcd
- Kubelet supports only cgroups v2 in K8s (and therefore K3s) v1.35 by default: I verified all of our VMs use cgroups2
With a bit of luck (and daring moral turpitude) we may be able to upgrade in just three hops: v1.28 -> v1.31, v1.31 -> v1.32, v1.32 -> v1.35