To upgrade Kubernetes 1.24 we need to upgrade Toolforge workers to Containerd. We need Debian 12 for a new enough Containerd version.
toolsbeta
- control
- worker
- ingress
tools
- control
- worker
- ingress
To upgrade Kubernetes 1.24 we need to upgrade Toolforge workers to Containerd. We need Debian 12 for a new enough Containerd version.
toolsbeta
tools
For this, we currently rely on docker settings to manage log length in containerd, much like prod does. We will want to find an equivalent later because some tools are otherwise very good at filling worker nodes (an old problem around here T148487). logrotate can handle it, but docker was quite good at it with fewer failures waiting for a logrotate run (yes people crashed k8s nodes between logrotate runs regularly, typically using java).
Whatever you use to solve that problem (some containerd setting or podman thing?), just know that our users can certainly outsmart logrotate by mistake.
Looks like https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration has options (not the snazziest when it comes to puppetizing, but you can).
Oh yeah, please don't remove docker-ce from the repos unless you account for the harbor use of it, also. It's running in docker compose and currently using our kubeadm components to do it.
I was planning on just using what Debian packages, https://packages.debian.org/bullseye/docker.io and https://apt-browser.toolforge.org/buster-wikimedia/thirdparty/kubeadm-k8s-1-20/ both seem recent enough. Good to know Harbor needs this though, thanks!
Based on https://github.com/containerd/containerd/issues/4830, https://github.com/kubernetes/enhancements/issues/2411, and CRIContainerLogRotation on https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ I think Kubernetes will automatically deal with log rotation on Containerd as needed.
Beware: Kubernetes 1.24 requires containerd v1.6.4+ or v1.5.11+, while Bullseye repositories have 1.4.13. Bookworm (bullseye + 1) will ship with 1.6, or we might need to use third-party packages which I'd really rather not do.
Change 967875 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] aptrepo: Import kubeadm 1.23 for bookworm
Change 968618 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:wmcs::kubeadm: rely on iptables-nft on bookworm
Change 967875 merged by Majavah:
[operations/puppet@production] aptrepo: Import kubeadm 1.23 for bookworm
Mentioned in SAL (#wikimedia-operations) [2023-10-25T10:02:03Z] <taavi> import kubernetes 1.23 packages for debian bookworm T284656
Change 968623 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:wmcs::kubeadm: install containerd on bookworm
Change 968618 merged by Majavah:
[operations/puppet@production] P:wmcs::kubeadm: rely on iptables-nft on bookworm
Change 968623 merged by Majavah:
[operations/puppet@production] P:wmcs::kubeadm: install containerd on bookworm
Change 968634 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] kubeadm: only install containerd.io with docker
Change 968635 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] kubeadm: containerd: install br_netfilter kmod
Change 968647 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] kubeadm: add required config for containerd
The above patches make it possible to provision a new host on bookworm. There are a couple of issues however:
taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/wmcs-k8s-metrics/-/merge_requests/4
chart: update cadvisor to 0.47.2
Change 968634 merged by Majavah:
[operations/puppet@production] kubeadm: only install containerd.io with docker
Change 968635 merged by Majavah:
[operations/puppet@production] kubeadm: containerd: add kernel modules and config
Change 968647 merged by Majavah:
[operations/puppet@production] kubeadm: add required config for containerd
cadvisor does not work
Fixed with the upgrade.
I haven't checked if the log file max size still works
According to the Kubernetes docs, "containerLogMaxSize is a quantity defining the maximum size of the container log file before it is rotated. For example: "5Mi" or "256Ki". If DynamicKubeletConfig (deprecated; default off) is on, when dynamically updating this field, consider that it may trigger log rotation. Default: "10Mi"". So I think we're fine.
So that leaves the extra volume. It seems like Containerd spreads what Docker stores in /var/lib/docker to a few different places so we need to workaround that somehow.
Change 992633 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:wmcs::kubeadm: worker: support containerd separate volume
Change 992633 merged by Majavah:
[operations/puppet@production] P:wmcs::kubeadm: worker: support containerd separate volume
Change 992923 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] toolforge: add_k8s_node: Add support for containerd
Change 992926 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] toolforge: add_k8s_node: Allow passing --network
Change 992923 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] toolforge: add_k8s_node: Add support for containerd
Change 992926 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] toolforge: add_k8s_node: Allow passing --network
Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T09:29:38Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster (T284656)
Running cookbook:
aborrero@cloudcumin1001:~ $ sudo cookbook wmcs.toolforge.add_k8s_node --cluster-name tools --task-id T284656 --role control
If this were the first bookworm control node, we would add the argument --image debian-12.0-bookworm, but since it is not, the cookbook will use the last node image.
Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T11:23:02Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_node (T284656)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T11:23:50Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) (T284656)
Change 1005766 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[cloud/wmcs-cookbooks@main] inventory: refresh tools k8s control nodes
Change 1005766 merged by Arturo Borrero Gonzalez:
[cloud/wmcs-cookbooks@main] inventory: refresh tools k8s control nodes
Change 1005954 had a related patch set uploaded (by Majavah; author: Majavah):
[cloud/wmcs-cookbooks@main] toolforge: k8s: Support containerd as container runtime
Change 1005954 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] toolforge: k8s: Support containerd as container runtime
Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-26T09:26:11Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster (T284656)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-26T09:53:59Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.remove_k8s_node (T284656)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-26T09:54:45Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) (T284656)
This is done:
aborrero@tools-sgebastion-11:~$ kubectl sudo get nodes -o wide | grep control tools-k8s-control-7 Ready control-plane,master 5d1h v1.23.17 172.16.0.144 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-18-cloud-amd64 containerd://1.6.20 tools-k8s-control-8 Ready control-plane,master 4d1h v1.23.17 172.16.5.194 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-18-cloud-amd64 containerd://1.6.20 tools-k8s-control-9 Ready control-plane,master 77m v1.23.17 172.16.3.135 <none> Debian GNU/Linux 12 (bookworm) 6.1.0-18-cloud-amd64 containerd://1.6.20
Change 1011138 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] kubeadm: Drop buster support
Change 1011138 merged by Majavah:
[operations/puppet@production] kubeadm: Drop buster support