Per T269684 we need to move away from docker. In February 2024, the serviceops team announced the results of the evaluation of the candidate replacement engines. Results and criteria have been documented in Kubernetes/CRE. The chosen container runtime engine was containerd. This task describes the plan for the migration and tracks the migration process itself
Plan
containerd upgrade
- We 'll probably need a new profile profile::containerd or similar.
- Create proper cgroups config for containerd (https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd)
- Handle pulling of restricted images with containerd (provide authentication credentials etc)
- Test integration with dragonfly/dfget
for the actual upgrade
- Run some workers (4 in codfw as a start) with bookworm, to surface potential OS related issues
- Create puppetization for the configuration required by kubernetes
- Reimage some nodes with bookworm + containerd (>=1.6)
- Upgrade all clusters to the newer containerd, rolling-reimage of nodes
nerdctl
Docker has a relatively user friendly CLI. containerd doesn't. the ctr tool it ships with is a lower level, albeit useful tool. nerdctl, is a CLI released by the containerd project that is CLI compatible with docker CLI
- Package nerdctl. Probably utilizing our Upstream binaries policy to avoid the onus of having to build every since dependency
- Use puppet to install the package and populate a nerdctl configuration file /etc/nerdctl/nerdctl.toml to default to namespace k8s.io
- Test and approve.
crictl
Kubernetes build crictl/cri-tools https://github.com/kubernetes-sigs/cri-tools/tree/master to interact with a CRI the way kubelet would. In my initial tests with nerctl it did not completely honor all containerd configuration (like registry mirrors and authentication we require for dragonfly). So I decided to also package cricrl and have it installed on all nodes.
Kubelet (the above are a prereq)
- Amend puppet to have behind a feature flag the following 2 parameters
--container-runtime-endpoint=unix:///run/containerd/containerd.sock --container-runtime=remote
Metrics
- Replace kubelet_docker_operations_* with kubelet_runtime_operations_*
Log processing
Parsing of logs does not work properly with containerd nodes. Logs that usually have the k8s_docker_log_field_parsed tag don't have it anymore:
T377132: containerd logs are not properly parsed during ingestion to logstash
Things to do after all k8s nodes have been migrated off of docker
- Remove puppet classes no longer in use (if there are any)
- Ensure all profile::docker::engine related hiera keys are gone (as well as profile::kubernetes::node::docker_kubernetes_user_password)
How to migrate to containerd
https://wikitech.wikimedia.org/wiki/Kubernetes/Administration/containerd_migration