Page MenuHomePhabricator

Draft a plan for upgrading kubernetes machines to buster
Open, LowPublic

Description

Intro

Per https://wikitech.wikimedia.org/wiki/Operating_system_upgrade_policy, by mid 2020 we will begin the deprecation phase of Stretch, with a deadline for removal on early to mid 2021. This task is about identifying our various blockers and drafting a plan for the migration of our kubernetes infrastructure to buster

Components

The major components are below. They are grouped in rather large groups, as there is little benefit in listing them one by one (e.g. kube-scheduler+kube-controller-manager etc).

Calico/CNI

We still haven't upgraded to newer calico versions. This is an unknown, we need to investigate/test more before we have a verdict on versions for this component.

Kubernetes

That is the component that is expected to have the least possible friction. It's golang, statically built, easy to share between our wikimedia repos.

Docker

Buster comes with docker 18.09.1+dfsg1-7.1+deb10u1. We probably want to run extensive tests before widely using it. We 've been holding off from upgrading from our current docker version as it has caused no issues up to now.

Kernel

buster comes with a newer kernel (4.19) that includes the patches listed at https://bugzilla.kernel.org/show_bug.cgi?id=198197 so that's great.

iptables

iptables in buster is 1.8.2, however we want to at least target 1.8.3 which is in buster-backports. The rationale of for that decision is based on https://github.com/kubernetes/kubernetes/issues/71305. However, https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/#ensure-iptables-tooling-does-not-use-the-nftables-backend way more clearly says to switch to iptables-legacy. Both should be evaluated.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFri, Feb 14, 2:23 PM
akosiaris triaged this task as Low priority.Fri, Feb 14, 2:23 PM

Buster comes with docker 18.09.1+dfsg1-7.1+deb10u1. We probably want to run extensive tests before widely using it. We 've been holding off from upgrading from our current docker version as it has caused no issues up to now.

Note also that we ran into significant performance issues with new Docker for CI jobs (as part of the Jessie->Stretch migraton), and so downgraded it: T236675: Investigate Docker slowness between 18.06.2 and 18.09.7