Page MenuHomePhabricator

k8s: kubelet.conf embeds cert data in k8s < 1.17
Closed, ResolvedPublic

Description

There is an issue with kubeadm-based kubernetes cluster bootstrapped previous to 1.17. The generated /etc/kubernetes/kubelet.conf file embeds the TLS cert/key. It wont be renewed by any means, so when it expires the kubelet wont start.

From https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

Warning:

   On nodes created with kubeadm init, prior to kubeadm version 1.17, there is a bug where you manually have to modify the contents of kubelet.conf. After kubeadm init finishes, you should update kubelet.conf to point to the rotated kubelet client certificates, by replacing client-certificate-data and client-key-data with:

   client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
   client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

Symthoms of this:

aborrero@tools-k8s-control-1:~$ sudo -i kubectl describe node tools-k8s-control-1
Name:               tools-k8s-control-1
[..]
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Thu, 10 Dec 2020 15:39:49 +0000   Thu, 10 Dec 2020 15:39:49 +0000   CalicoIsUp          Calico is running on this node
  MemoryPressure       Unknown   Thu, 10 Dec 2020 16:00:15 +0000   Thu, 10 Dec 2020 16:01:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Thu, 10 Dec 2020 16:00:15 +0000   Thu, 10 Dec 2020 16:01:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Thu, 10 Dec 2020 16:00:15 +0000   Thu, 10 Dec 2020 16:01:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Thu, 10 Dec 2020 16:00:15 +0000   Thu, 10 Dec 2020 16:01:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
[..]
aborrero@tools-k8s-control-1:~$ sudo tail -f /var/log/syslog
Dec 10 16:17:06 tools-k8s-control-1 kubelet[12499]: I1210 16:17:06.273861   12499 server.go:410] Version: v1.16.10
Dec 10 16:17:06 tools-k8s-control-1 kubelet[12499]: I1210 16:17:06.274447   12499 plugins.go:100] No cloud provider specified.
Dec 10 16:17:06 tools-k8s-control-1 kubelet[12499]: I1210 16:17:06.274504   12499 server.go:773] Client rotation is on, will bootstrap in background
Dec 10 16:17:06 tools-k8s-control-1 kubelet[12499]: E1210 16:17:06.279031   12499 bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2020-11-05 14:13:51 +0000 UTC
Dec 10 16:17:06 tools-k8s-control-1 kubelet[12499]: F1210 16:17:06.279126   12499 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

Related upstream bug: https://github.com/kubernetes/kubeadm/issues/2054

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2020-12-10T17:00:51Z] <arturo> fixing /etc/kubernetes/kublet.conf and restarting kubelet in paws-k8s-control-1 (T269865)

toolsbeta is fixed already (all nodes were rebuilt not long ago)

aborrero claimed this task.
aborrero triaged this task as High priority.
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.