Page MenuHomePhabricator

Migration to containerd and away from docker
Open, HighPublic

Description

Per T269684 we need to move away from docker. In February 2024, the serviceops team announced the results of the evaluation of the candidate replacement engines. Results and criteria have been documented in Kubernetes/CRE. The chosen container runtime engine was containerd. This task describes the plan for the migration and tracks the migration process itself

Plan

containerd upgrade

  1. Package and build containerd from bookworm for bullseye. The reason for this is various configuration directives that exist in the version in bookworm are referenced in the kubernetes upstream docs. See https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd. It also makes the migration to 1.25 kubernetes (T341984) a tad easier.
  2. Upload package to apt.wikimedia.org, upgrade staging nodes
  3. Upgrade all clusters to the newer containerd.
  4. Create puppetization for the configuration required by kubernetes

We 'll probably need a new profile profile::containerd or similar

Note: The upgrade process above doesn't require any kind of feature gating or flagging in puppet but rather just a deb-deploy manifest.

nerdctl

Docker has a relatively user friendly CLI. containerd doesn't. the ctr tool it ships with is a lower level, albeit useful tool. nerdctl, is a CLI released by the containerd project that is CLI compatible with docker CLI

  1. Package nerdctl. Probably utilizing our Upstream binaries policy to avoid the onus of having to build every since dependency
  2. Use puppet to install the package and populate a nerdctl configuration file /etc/nerdctl/nerdctl.toml to default to namespace k8s.io
  3. Test and approve.

Kubelet (the above are a prereq)

  1. Amend puppet to have behind a feature flag the following 2 parameters
--container-runtime-endpoint=unix:///run/containerd/containerd.sock 
--container-runtime=remote

Perform the migration in locksteps

Roughly for every batch of nodes

  1. Drain the nodes using kubectl drain --ignore-daemonsets=true --delete-emptydir-data=true
  2. Flip the feature flag in puppet for this batch
  3. Run puppet
  4. Rinse, repeat

Event Timeline

@akosiaris could you please double check in your test environment that containerd will still enforce the default apparmor profile (see Remove apparmor.security.beta.kubernetes.io/defaultProfileName in T273507: PodSecurityPolicies will be deprecated with Kubernetes 1.21) like docker currently does?

@akosiaris could you please double check in your test environment that containerd will still enforce the default apparmor profile (see Remove apparmor.security.beta.kubernetes.io/defaultProfileName in T273507: PodSecurityPolicies will be deprecated with Kubernetes 1.21) like docker currently does?

Done in T273507#9739926

akosiaris updated the task description. (Show Details)