Page MenuHomePhabricator

Update Kubernetes clusters to v1.23
Open, HighPublic

Description

We need/want to upgrade our Kubernetes clusters to Kubernetes v1.23

Kubernetes v1.23 was selected as a target:

  • because it is the last version supporting dockershim (and we don't want to move away from that together with this update)
  • v1.24 was only released 2022-05-03 and we usually don't want to upgrade to a just released major/miner

Together with the Kubernetes update, we need to update the following other components (subtasks might be a good idea for each of them when more details available):

Preparation for the Kubernetes update:

  • Double check out docker API version is supported with Kubernetes v1.23 (minimum version is still 1.26.0)
  • Check if migration from command line flags to config files is required for some Kubernetes components: T300499
    • Looks to me like it is not yet strictly needed. It might be for some specific flags, though. That we will figure out during upgrade tests I guess.
  • Ensure all our charts are compatible with Kubernetes v1.23
    • Our current validation with kubeyaml does not support this and getting support for 1.23 into kubeyaml will require time that could be better invested into replacing kubeyaml in deployment-charts CI with kubeconform: T306165
  • Read Kubernetes changelogs (yellow/red flags just linked below each version. Tick the box if all action required items have been addressed), https://relnotes.k8s.io (relevant versions)
  • v1.18
    • Action Required
    • Note
      • The following features are unconditionally enabled and the corresponding --feature-gates flags have been removed: PodPriority, TaintNodesByCondition, ResourceQuotaScopeSelectors and ScheduleDaemonSetPods (#86210, @draveness)
      • Kube-proxy: Added dual-stack IPv4/IPv6 support to the iptables proxier. (#82462, @vllry)
      • Support server-side dry-run in kubectl with --dry-run=server for commands including apply, patch, create, run, annotate, label, set, autoscale, drain, rollout undo, and expose. (#87714, @julianvmodesto)
  • v1.19
    • Action Required
      • seccomp graduates to GA, check if we need to migrate PSPs off the annotations https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#seccomp-graduates-to-general-availability (this only affects pod spec not PSPs, nothing to do here)
      • Kube-apiserver: the componentstatus API is deprecated. This API provided status of etcd, kube-scheduler, and kube-controller-manager components, but only worked when those components were local to the API server, and when kube-scheduler and kube-controller-manager exposed unsecured health endpoints. Instead of this API, etcd health is included in the kube-apiserver health check and kube-scheduler/kube-controller-manager health checks can be made directly against those components' health endpoints. (#93570, @liggitt)
      • โœ… Kubeadm now includes CoreDNS version v1.7.0.
      • โœ… Kube-apiserver: The NodeRestriction admission plugin now restricts Node labels kubelets are permitted to set when creating a new Node to the --node-labels parameters accepted by kubelets in 1.16+. (#90307, @liggitt)
    • Note
      • EndpointSlices, be aware (during debugging) https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/
      • Fix bug in reflector that couldn't recover from "Too large resource version" errors (#92537, @wojtek-t) [SIG API Machinery]
      • Kubelet: add '--logging-format' flag to support structured logging (#91532, @afrouzMashaykhi)
      • Add --logging-format flag for component-base. Defaults to "text" using unchanged klog. (#89683, @yuzhiquan)
      • Kube-controller-manager: add '--logging-format' flag to support structured logging (#91521, @SataQiu)
      • Kube-scheduler: add '--logging-format' flag to support structured logging (#91522, @SataQiu)
      • The DefaultIngressClass feature is now GA. The --feature-gate parameter will be removed in 1.20. (#91957, @cmluciano)
      • The kube-controller-manager managed signers can now have distinct signing certificates and keys. See the help about --cluster-signing-[signer-name]-{cert,key}-file. --cluster-signing-{cert,key}-file is still the default. (#90822, @deads2k)2
      • Kube-apiserver, kube-scheduler and kube-controller manager now use SO_REUSEPORT socket option when listening on address defined by --bind-address and --secure-port flags, when running on Unix systems (Windows is NOT supported). This allows to run multiple instances of those processes on a single host with the same configuration, which allows to update/restart them in a graceful way, without causing downtime. (#88893, @invidian)
  • v1.20
    • Action Required
      • โœ… TokenRequest and TokenRequestProjection are now GA features. The following flags are required by the API server:
        • โœ… --service-account-issuer, should be set to a URL identifying the API server that will be stable over the cluster lifetime.
        • โœ… --service-account-key-file, set to one or more files containing one or more public keys used to verify tokens.
        • โœ… --service-account-signing-key-file, set to a file containing a private key to use to sign service account tokens. Can be the same file given to kube-controller-manager with --service-account-private-key-file. (#95896, @zshihang)
      • โœ… Resolves non-deterministic behavior of the garbage collection controller when ownerReferences with incorrect data are encountered. Events with a reason of OwnerRefInvalidNamespace are recorded when namespace mismatches between child and owner objects are detected. The kubectl-check-ownerreferences tool can be run prior to upgrading to locate existing objects with invalid ownerReferences: https://github.com/kubernetes-sigs/kubectl-check-ownerreferences
      • โœ… In dual-stack bare-metal clusters, you can now pass dual-stack IPs to kubelet --node-ip. eg: kubelet --node-ip 10.1.0.5,fd01::0005
        • โœ… In dual-stack clusters where nodes have dual-stack addresses, hostNetwork pods will now get dual-stack PodIPs.
    • Note
      • A bug was fixed in kubelet where exec probe timeouts were not respected. This may result in unexpected behavior since the default timeout (if not specified) is 1s which may be too small for some exec probes. Ensure that pods relying on this behavior are updated to correctly handle probe timeouts.
      • Kubernetes 1.20 now enables API Priority and Fairness (APF) by default.
      • IPv4/IPv6 dual-stack has been reimplemented for 1.20 to support dual-stack Services: https://docs.k8s.io/concepts/services-networking/dual-stack/
      • On-demand metrics calculation is now available through /metrics/resources
      • kubectl alpha debug graduates from alpha to beta in 1.20, becoming kubectl debug
      • Support the node label node.kubernetes.io/exclude-from-external-load-balancers (might be an idea to exclude ganeti VM nodes from LVS?)
  • v1.21
    • Action Required
      • โœ… Kubeadm now includes CoreDNS v1.8.0
      • โœ… New admission controller DenyServiceExternalIPs is available. Clusters which do not need the Service externalIPs feature should enable this controller and be more secure. (#97395, @thockin)
      • โœ… The pause image upgraded to v3.4.1 in kubelet and kubeadm for both Linux and Windows. (#98205, @pacoxu) T322920
      • โœ… Update the latest validated version of Docker to 20.10
      • โœ… Upgrades IPv6Dualstack to Beta and turns it on by default. New clusters or existing clusters are not be affected until an actor starts adding secondary Pods and service CIDRS CLI flags as described here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/563-dual-stack (#98969, @khenidak)
    • Note
      • Pod with multiple containers can use kubectl.kubernetes.io/default-container annotation to have a container preselected for kubectl commands.
      • Immutable Secrets and ConfigMaps graduates to GA. This feature allows users to specify that the contents of a particular Secret or ConfigMap is immutable for its object lifetime. For such instances, Kubelet will not watch/poll for changes and therefore reducing apiserver load. (Probably true for almost all of our configmap/secret objects as we roll-restart deployments on configmap changes anyways).
      • ServiceNodeExclusion, NodeDisruptionExclusion and LegacyNodeRoleBehavior features have been promoted to GA. ServiceNodeExclusion and NodeDisruptionExclusion are now unconditionally enabled, while LegacyNodeRoleBehavior is unconditionally disabled.
      • โœ… TokenRequest and TokenRequestProjection feature gates have been removed and are unconditionally enabled
      • Kubelet Graceful Node Shutdown feature graduates to Beta and enabled by default.
      • Namespace API objects now have a kubernetes.io/metadata.name label matching their metadata.name field to allow selecting any namespace by its name using a label selector.
      • Kubectl: kubectl get will omit managed fields by default now. Users could set --show-managed-fields to true to show managedFields when the output format is either json or yaml. (#96878, @knight42)
  • v1.22
    • Action Required
      • โœ… Various beta API removals. We're not affected as kubeconform would have given notice
      • โœ… controller-manager changes:
        • โœ… controller-manager MUST start with --authorization-kubeconfig and --authentication-kubeconfig correctly set to get authentication/authorization working
        • โœ… Applications that fetch metrics from controller-manager should use a dedicated service account which is allowed to access nonResourceURLs /metrics. (#96216, @knight42)
        • โœ… (don't think we use that) liveness/readiness probes to controller-manager MUST use HTTPS now, and the default port has been changed to 10257
      • โœ… Updated pause image to version 3.5, which now runs per default as pseudo user and group 65535:65535. This does not have any effect on remote container runtimes like CRI-O and containerd, which setup the pod sandbox user and group on their own. (#100292, @saschagrunert) T322920
    • Note
      • As of now both system-node-critical and system-cluster-critical pods have -997 OOM score, making them one of the last processes to be OOMKilled. If the user wants to have the pod to be OOMKilled last and the pod has system-cluster-critical priority class, it has to be changed to system-node-critical priority class to preserve the existing behavior (#99729, @ravisantoshgudimetla)
      • Server-side Apply is GA https://kubernetes.io/docs/reference/using-api/server-side-apply/
      • Default/kubeadm etcd moves to version 3.5.0
        • Data ccorruption issues with etcd 3.5.[0-2], use >=3.5.3
      • Introducing Memory quality of service support with cgroups v2 (Alpha). The MemoryQoS feature is now in Alpha. This allows kubelet running with cgroups v2 to set memory QoS at container, pod and QoS level to protect and guarantee better memory quality. This feature can be enabled through feature gate Memory QoS. (#102970, @borgerli)
      • Kube-apiserver: the alpha PodSecurity feature can be enabled by passing --feature-gates=PodSecurity=true, and enables controlling allowed pods using namespace labels. See https://git.k8s.io/enhancements/keps/sig-auth/2579-psp-replacement for more details. (#103099, @liggitt)
      • The EmptyDir memory backed volumes are sized as the the minimum of pod allocatable memory on a host and an optional explicit user provided value. (#101048, @dims)
      • The NamespaceDefaultLabelName is promoted to GA in this release. All Namespace API objects have a kubernetes.io/metadata.name label matching their metadata.name field to allow selecting any namespace by its name using a label selector. (#101342, @rosenhouse)
  • v1.23
    • Action Required
    • Note
      • IPv4/IPv6 Dual-stack Networking graduates to GA https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/563-dual-stack
      • PodSecurity graduates to Beta (In 1.23, the PodSecurity feature gate is enabled by default.) https://kubernetes.io/docs/concepts/security/pod-security-admission/
      • Structured logging graduate to Beta
      • Log messages in JSON format are written to stderr by default now (same as text format) instead of stdout. Users who expected JSON output on stdout must now capture stderr instead or in addition to stdout. (#106146, @pohly) (we log zu stderr anyways, so probably no change here)
      • Support for the seccomp annotations seccomp.security.alpha.kubernetes.io/pod and container.seccomp.security.alpha.kubernetes.io/[name] has been deprecated since 1.19, will be dropped in 1.25. Transition to using the seccompProfile API field. (#104389, @saschagrunert)
      • Ephemeral containers graduated to beta and are now available by default. (#105405, @verb)
      • The TTLAfterFinished feature gate is now GA and enabled by default. (#105219, @sahilvv)
      • Introduce a feature gate DisableKubeletCloudCredentialProviders which allows disabling the in-tree kubelet credential providers. (#102507, @ostrain)
  • Read Calico changelogs
    • v3.17 > 3.17.0
      • All components that use Typha now use the same logic to discover Typhaโ€™s address. They lookup the endpoints of the service directly and connect to one at random. This avoids a dependency on kube-proxy. typha #466 (@fasaxc)
      • kube-controllers runs a non-root by default kube-controllers #566 (@caseydavenport)
    • v3.18
    • v3.19
      • Update ipables version to 1.8.4-15
      • By default, limit each node to 20 IP address blocks. This value can be overridden through IPAM configuration.
    • v3.20
      • Service-based egress rules; Calico NetworkPolicy and GlobalNetworkPolicy now support egress rules which match on Kubernetes service names. Service matches in egress rules can be used to allow or deny access to in-cluster services, as well as services typically not backed by pods (for example, the Kubernetes API). Address and port information is learned from the individual endpoints within the service.
      • Configurable BGP graceful restart timer; See the maxRestartTime configuration option in the BGPPeer API.
      • calico/node marks nodes with NetworkUnavailable=true on shutdown node #993 (@song-jiang)
      • Add IP address garbage collection to kube-controllers kube-controllers #744 (@caseydavenport)
      • Calico will now release empty IPAM blocks from nodes that no longer need them so they can be used elsewhere. kube-controllers #799 (@caseydavenport)
    • v3.21
      • For users of BGP you can now view the status of your BGP routers, including session status, RIB / FIB contents, and agent health via the new CalicoNodeStatus API: https://docs.projectcalico.org/archive/v3.21/reference/resources/caliconodestatus
      • Service-based ingress rules; In v3.20, we introduced egress policy rules that can match on Kubernetes services. In v3.21, we improved upon that in two ways. First, you can now use service matches in Calico NetworkPolicy and GlobalNetworkPolicy ingress rules. Second, you can now use service-based network policy rules on Windows nodes.
      • Option to run Calico as non-privileged and non-root; https://docs.projectcalico.org/archive/v3.21/security/non-privileged
      • ACTION REQUIRED: calico/node logs write to /var/log/calico within the container by default, in addition to stdout node #1133 (@song-jiang)
    • v3.22
      • None
    • v3.23
      • Update to CNI plugins v1.1.1 calico #5944 (@caseydavenport)
      • New per-pool IPAM metrics added calico #5706 (@pasanw)

Additional things to do together with the re-init of clusters:

  • Move to bigger Pod and Service IP range (T326617)
  • Delete etcd v2 datastore of calico: $ etcdctl -C https://$(hostname -f):2379 rm -r /calico
  • Reimage etcd clusters to bullseye (so the above is probably not needed)
  • enable DenyServiceExternalIPs admission plugin

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+4 -5
operations/cookbooksmaster+166 -0
operations/puppetproduction+41 -49
operations/puppetproduction+3 -1
operations/alertsmaster+15 -2
operations/puppetproduction+10 -0
labs/privatemaster+4 -0
operations/deployment-chartsmaster+15 -0
operations/deployment-chartsmaster+11 -1
operations/puppetproduction+47 -2
operations/puppetproduction+0 -6
operations/puppetproduction+4 -0
operations/deployment-chartsmaster+20 -4
operations/puppetproduction+4 -22
operations/puppetproduction+429 -130
operations/puppetproduction+89 -0
operations/puppetproduction+1 -2
operations/puppetproduction+40 -23
operations/puppetproduction+48 -19
operations/puppetproduction+44 -4
operations/puppetproduction+8 -0
operations/debs/kubernetesv1.23+6 -0
operations/puppetproduction+26 -45
operations/deployment-chartsmaster+3 -2
operations/deployment-chartsmaster+29 -84
operations/puppetproduction+12 -14
operations/deployment-chartsmaster+72 -11
operations/deployment-chartsmaster+30 -3
operations/puppetproduction+29 -13
operations/puppetproduction+83 -41
operations/puppetproduction+42 -16
operations/puppetproduction+66 -91
operations/puppetproduction+3 -4
operations/puppetproduction+10 -2
operations/puppetproduction+0 -12
operations/debs/kubernetesv1.23+19 -18
operations/puppetproduction+19 -94
operations/puppetproduction+2 -0
operations/puppetproduction+65 -75
operations/puppetproduction+2 -0
operations/debs/calicov3.23+25 -6
operations/deployment-chartsmaster+921 -141
operations/deployment-chartsmaster+15 -1
operations/debs/calicov3.20+8 -2
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
Openayounsi
Resolvedelukey
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedClement_Goubert
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedClement_Goubert
Resolvedelukey
Resolvedelukey
OpenNone
Resolvedelukey
Openelukey
ResolvedJMeybohm
OpenNone
Resolvedelukey
ResolvedJMeybohm
Openjijiki
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedBTullis
ResolvedJMeybohm
ResolvedMarostegui
ResolvedJMeybohm
ResolvedJMeybohm
OpenJMeybohm
ResolvedJMeybohm
OpenNone
OpenJMeybohm
ResolvedJMeybohm
OpenNone
OpenNone
Resolvedelukey
Resolvedelukey
DeclinedNone
ResolvedNone
Resolvedelukey
Resolvedelukey
ResolvedJMeybohm
ResolvedJMeybohm
OpenJMeybohm
ResolvedCDanis
OpenNone
Resolvedakosiaris

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 856589 merged by JMeybohm:

[operations/puppet@production] k8s: Add a central ipv6dualstack flag to enable dual stack

https://gerrit.wikimedia.org/r/856589

Change 857004 merged by JMeybohm:

[operations/puppet@production] k8s: Fix duplicate definition of --service-account-key-file

https://gerrit.wikimedia.org/r/857004

JMeybohm closed subtask Restricted Task as Resolved.Nov 18 2022, 9:22 AM
JMeybohm closed subtask Restricted Task as Resolved.

Change 865592 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Add support for PKI with k8s >= 1.23

https://gerrit.wikimedia.org/r/865592

Change 866444 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Remove authz_mode hiera key

https://gerrit.wikimedia.org/r/866444

Change 867182 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] pki: Add intermediate certifikates for wikikube and wikikube_staging

https://gerrit.wikimedia.org/r/867182

Change 867182 merged by JMeybohm:

[operations/puppet@production] pki: Add intermediate certifikates for wikikube and wikikube_staging

https://gerrit.wikimedia.org/r/867182

Change 865592 merged by JMeybohm:

[operations/puppet@production] k8s: Add support for PKI with k8s >= 1.23

https://gerrit.wikimedia.org/r/865592

Change 866444 merged by JMeybohm:

[operations/puppet@production] k8s: Remove authz_mode hiera key

https://gerrit.wikimedia.org/r/866444

Change 868232 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] If-guard admin_ng objects no longer relevant for k8s 1.23

https://gerrit.wikimedia.org/r/868232

Change 868389 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] WIP: Update staging-codfw to k8s 1.23

https://gerrit.wikimedia.org/r/868389

Change 868232 merged by jenkins-bot:

[operations/deployment-charts@master] If-guard admin_ng objects no longer relevant for k8s 1.23

https://gerrit.wikimedia.org/r/868232

Change 870820 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Add the ClusterIP of kubernetes.default.cluster.local to cert

https://gerrit.wikimedia.org/r/870820

Change 870820 merged by JMeybohm:

[operations/puppet@production] k8s: Add the ClusterIP of kubernetes.default.cluster.local to cert

https://gerrit.wikimedia.org/r/870820

Change 877963 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Remove default kubelet_cluster_domain definitions

https://gerrit.wikimedia.org/r/877963

Change 877990 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Update staging-codfw to kubernetes 1.23

https://gerrit.wikimedia.org/r/877990

Change 877963 merged by JMeybohm:

[operations/puppet@production] k8s: Remove default kubelet_cluster_domain definitions

https://gerrit.wikimedia.org/r/877963

Change 877990 merged by JMeybohm:

[operations/puppet@production] k8s: Update staging-codfw to kubernetes 1.23

https://gerrit.wikimedia.org/r/877990

Change 868389 merged by jenkins-bot:

[operations/deployment-charts@master] Update staging-codfw to k8s 1.23

https://gerrit.wikimedia.org/r/868389

Change 878940 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Pin coredns, eventrouter and helm-state-metrics for k8s 1.16

https://gerrit.wikimedia.org/r/878940

Change 878940 merged by jenkins-bot:

[operations/deployment-charts@master] Pin coredns, eventrouter and helm-state-metrics for k8s 1.16

https://gerrit.wikimedia.org/r/878940

Change 883130 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] Add prometheus user to the system:monitoring group

https://gerrit.wikimedia.org/r/883130

Change 883130 merged by JMeybohm:

[labs/private@master] Add prometheus user to the system:monitoring group

https://gerrit.wikimedia.org/r/883130

Change 883539 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/alerts@master] KubernetesAPIErrorRate: make alert v1.23 compatible

https://gerrit.wikimedia.org/r/883539

Change 884305 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes: Incease inotify limits

https://gerrit.wikimedia.org/r/884305

Change 884305 merged by JMeybohm:

[operations/puppet@production] kubernetes: Increase inotify limits

https://gerrit.wikimedia.org/r/884305

Change 883539 merged by jenkins-bot:

[operations/alerts@master] KubernetesAPIErrorRate: make alert v1.23 compatible

https://gerrit.wikimedia.org/r/883539

Change 887981 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s::package: Ensure the apt component is registered first

https://gerrit.wikimedia.org/r/887981

Change 887981 merged by JMeybohm:

[operations/puppet@production] k8s::package: Ensure the apt component is registered first

https://gerrit.wikimedia.org/r/887981

Change 889486 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Kubernets masters: include profile::kubernetes::client

https://gerrit.wikimedia.org/r/889486

Change 889486 merged by JMeybohm:

[operations/puppet@production] Kubernetes masters: include profile::kubernetes::client

https://gerrit.wikimedia.org/r/889486

Change 889956 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/cookbooks@master] Add sre.k8s.wipe-cluster.py

https://gerrit.wikimedia.org/r/889956

Change 889956 merged by jenkins-bot:

[operations/cookbooks@master] Add sre.k8s.wipe-cluster.py

https://gerrit.wikimedia.org/r/889956

Change 895138 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes__deployment_server: Switch to k8s 1.23

https://gerrit.wikimedia.org/r/895138

Mentioned in SAL (#wikimedia-operations) [2023-03-07T10:51:07Z] <akosiaris> manually label kubemaster1001, kubemaster1002 giving them role master T307943

Change 895138 merged by Alexandros Kosiaris:

[operations/puppet@production] profile::kubernetes::client: Switch to k8s 1.23

https://gerrit.wikimedia.org/r/895138

Mentioned in SAL (#wikimedia-operations) [2023-03-08T11:26:08Z] <akosiaris> T307943 upgrade kubernetes-client on deploy1002 deploy2002