Umbrella task to track the work required towards upgrading our Kubernetes clusters to Kubernetes 1.31.
We're currently running 1.23 which went EOL on 2023-02-08 and there are some bigger requirement to be dealt with before moving to a newer version:
- Migrate away from docker ad container runtime: T269684
- We need to migrate away from PodSecurityPolicies: T273507
Together with the Kubernetes update, we need to update the following other components:
Core components (more or less on all clusters):
- ✅ calico (currenty 3.23.3)
- https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements
- 3.29 supports k8s: 1.29 -> 1.31
- 3.23.x
- nothing relevant
- 3.24.x
- Calico no longer installs pod security policies (deprecated in Kubernetes 1.21) and now deploys pod security standards.
- Tolerate node-role.kubernetes.io/control-plane taints #6370
- calicoctl ipam check/release now look for and clean up unused IPAM handles. #6155
- 3.25.x
- Typha now supports graceful shutdown (terminationGracePeriodSeconds)
- Typha now supports compression on its protocol; this gives a 5:1 reduction in bandwidth use
- Typha now shares computed (and compressed) snapshots between clients that connect at approximately the same time. This significantly reduces CPU usage and the time to service all clients when many clients connect at once.
- Fix that calicoctl ipam release could only release IPAM handles when running in etcd mode. #6650
- Many of Typha's Prometheus metrics are now split by syncer (client) type, represented by a label "syncer" on the metrics. This prevents cross-talk where the syncers would all share the same metrics and the last writer to the metric would "win". #6675
- 3.26.x
- Separate calico-node and calico-cni-plugin service accounts #7106
- 3.27.x
- Update Typha Deployment tolerations to helm charts so that it can be scheduled on any node. #7979
- 3.28.x
- The Calico CNI plugin can be configured to prevent new pods from starting their containers until the pod's policy has finished being programmed
- IPv6 support for eBPF mode is generally available
- We made it easier to migrate your cluster from iptables to eBPF data plan modes. Previously, having a mix of nodes in iptables and eBPF modes could cause a breakdown in cluster networking. Now, the operator makes the transition with minimal disruption.
- Update the Grafana dashboard for Typha. Tested with Grafana v10.4.0. #8613
- Improve IPAM block garbage collection behavior for IP pools with small blocks. #8454
- Helm chart now supports specifying priorityClassName in values.yaml #8427
- Ability to set FelixConfiguration via helm chart #8559
- The calico-node, calico-kube-controllers and calico-typha pods now run with securityContext.seccompProfile.type=RuntimeDefault. #6524
- 3.29.x
- Tiered network policy and support for the AdminNetworkPolicy resource
- This release includes support for the nftables dataplane.
- In manifest installs, in order to prevent default IP-pools creation, CALICO_IPV4POOL_CIDR=none and CALICO_IPV6POOL_CIDR=none environment variable special values are now supported. #8156
- The calico-kube-controllers container now runs with securityContext.runAsNonRoot=true # 36499
- Felix's route resync logic has been optimised; it now uses 50% less CPU time and 80% less memory. #9139
- Expose the Go runtime's "GOMAXPROCS" setting via felix configuration. This may be useful for tuning Felix to take account of CPU limits. #8945
- Felix now sets the Go runtime's GC threshold to 40% (instead of the more aggressive 20% used previously). This trades slight extra RAM usage for significantly lower GC CPU usage. The setting is now exposed in the FelixConfiguration as goGCThreshold, along with goMemoryLimitMB. To get the old behaviour, set goGCThreshold to 20. If memory usage is not a concern, the value can be set even higher to reduce CPU usage. #8904
- Istio (currenty 1.15.7)
- https://istio.io/latest/docs/releases/supported-releases/#support-status-of-istio-releases
- 1.24 supports k8s: 1.28 -> 1.31
- cert-manager (currenty 1.10.1)
- https://cert-manager.io/docs/releases/#currently-supported-releases
- 1.16 supports k8s: 1.25 -> 1.31
- coredns (currenty 1.8.7)
- https://github.com/coredns/deployment/blob/master/kubernetes/CoreDNS-k8s_version.md
- 1.11.3 supports k8s: 1.31 (actually 1.11.3 is installed by kubeadm 1.31)
- helm (currently 3.11.3)
- https://helm.sh/docs/topics/version_skew/#supported-version-skew
- 3.16.x supports k8s: 1.28 -> 1.31
- ❗We probably want to run two different helm versions. We need to continue using 3.11 for deployments to k8s 1.23 clusters and helm 3.16 for k8s 1.31
- 3.12
- Action required: none
- Note:
- When charts are pushed to OCI registries, annotations are attached using standard names that other tools can introspect (e.g., version).
- --set-literal command-line flag to set a specific string with no escaping.
- --cascade flag to specify the deletion policy on uninstall.
- 3.13
- Action required: none
- Note:
- The --dry-run flag now has multiple options, enabling Helm to connect to a Kubernetes instance. The default behavior, when --dry-run is used, is unchanged.
- Values handling had numerous issues fixed and now consistently follows this priority:
- User-specified values (e.g., CLI).
- Parent chart values.
- Imported values.
- Subchart values.
- Additionally, null can now consistently be used to remove values. Note: there is a regression in 3.13.0 that's fixed in 3.13.1.
- Helm now adds the OCI creation annotation.
- New helm get metadata command.
- Added labels support for install and upgrade commands.
- 3.14
- Action required: none
- Note:
- New helm search flag: --fail-on-no-result.
- Allows a nested tpl invocation access to defines.
- Added qps/HELM_QPS parameter for Kubernetes rate limiting.
- Added --kube-version to the lint command.
- 3.15 AppArmor profiles can now be configured through fields on the PodSecurityContext and container SecurityContext. The beta AppArmor annotations are deprecated, and AppArmor status is no longer included in the node ready condition. (#123435)
- Action required: none
- Note:
- Opt-in to hiding secrets when running dry-run for install and upgrade.
- 3.16
- Action required: none
- Note:
- Added sha512sum template function.
- Added --skip-schema-validation flag to helm install, upgrade, and lint.
- kube-state-metrics (currenty 2.10.0)
- https://github.com/kubernetes/kube-state-metrics?tab=readme-ov-file#compatibility-matrix
- v2.14.0 supports k8s: v1.31
- 2.11.0
- action required: none
- note: This release builds with Golang v1.21.8.
- 2.12.0
- action required: none
- note: This release builds with k8s.io/client-go: v0.29.3.
- 2.13.0
- action required: none
- note:
- This release builds with Golang v1.22.5.
- This release builds with k8s.io/client-go: v0.30.3.
- This release adds read and write timeouts for requests. The defaults might have an impact on scrapes that take a long time.
- 2.14.0
- action required:
- check if we are using kube_endpoint_address_not_ready and kube_endpoint_address_available metrics and preplace them
- note:
- This release builds with Golang v1.23.3
- This release builds with k8s.io/client-go: v0.31.2
- This release removes kube_endpoint_address_not_ready and kube_endpoint_address_available which have been deprecated in 2022. Please use kube_endpoint_addressas a replacement.
- action required:
Operators/Addons (only on specific clusters):
- kserve (currently 0.11.2)
- https://kserve.github.io/website/0.13/admin/serverless/serverless/
- ❌ 0.13 recommends k8s 1.29, istio 1.21, knative 13.1
- knative-serving (currenty 1.7.2)
- https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/
- 1.16 reads "Kubernetes v1.28 or newer"
- apache flink-kubernetes-operator (current 1.4)
- Unable to find any k8s version requirements/recommendations
- spark-operator (currenlty 1.3.8-3.3.2-2)
- https://github.com/kubeflow/spark-operator?tab=readme-ov-file#version-matrix not sure if this is the correct one?
- v2.0.x supports k8s: 1.16+
- v1beta2-1.6.x-3.5.0 supports k8s: 1.16+
- v1beta2-1.5.x-3.5.0 supports k8s: 1.16+
- v1beta2-1.4.x-3.5.0 supports k8s: 1.16+
- v1beta2-1.3.x-3.1.1 supports k8s: 1.16+
- pg-operator (currently 1.24.1)
- https://cloudnative-pg.io/documentation/current/supported_releases/#support-status-of-cloudnativepg-releases
- 1.24.x supports k8s: 1.28, 1.29, 1.30, 1.31
- 1.23.x supports k8s: 1.27, 1.28, 1.29
- ceph-operator (currently v3.7.2)
- https://github.com/ceph/ceph-csi?tab=readme-ov-file#known-to-work-co-platforms
- v3.12.0 supports k8s: 1.29, v1.30, v1.31
- v3.11.0 supports k8s: 1.26, v1.27, v1.28, v1.29
Preparation for the Kubernetes update
- Ensure all our charts are compatible with the new Kubernetes version (currently validating against 1.27) T379919
- Read Kubernetes changelogs (yellow/red flags just linked below each version. Tick the box if all action required items have been addressed, use ✅ for single items)
- Official https://relnotes.k8s.io is broken, I build a custom version at https://people.wikimedia.org/~jayme/k8s-relnotes/ (k8s 1.23.14 - 1.31.2)
- v1.23.15-1.23.17
- Action Required
- Note
- kube-apiserver defaults the GOGC setting to 63, to approximate go1.17 garbage collection memory performance in heavily loaded API servers
- v1.24
- Action Required
- ✅ Docker runtime support using dockershim in the kubelet is now completely removed
Artifacts are now signed and can be verified in our package build process: https://kubernetes.io/docs/tasks/administer-cluster/verify-signed-artifacts/- The LegacyServiceAccountTokenNoAutoGeneration feature gate is beta, and enabled by default. Secret API objects containing service account tokens are no longer auto-generated for every ServiceAccount.
- IIRC we're using projected tokens everywhere already and service account secrets are not used/mounted. But we need to double check. If a secret is used/mounted, the deployment needs to take care of creating it (https://kubernetes.io/docs/concepts/configuration/secret/#serviceaccount-token-secrets)
Remove any use of --experimental-check-node-capabilities-before-mount from your kubelet scripts or manifests.- ✅ The --pod-infra-container-image kubelet flag is deprecated and will be removed in future releases.
- We can safely remove this for clusters on containerd
- Renamed metrics evictions_number to evictions_total and mark it as stable. The original evictions_number metrics name is marked as "Deprecated" and has been removed in kubernetes 1.23 . (#106366)
- ✅ Kubelet: the following dockershim related flags are also removed along with dockershim --experimental-dockershim-root-directory, --docker-endpoint, --image-pull-progress-deadline, --network-plugin, --cni-conf-dir, --cni-bin-dir, --cni-cache-dir, --network-plugin-mtu. (#106907)
- ✅ Kubernetes 1.24 bumped version of golang it is compiled with to go1.18, which introduced significant changes to its garbage collection algorithm. As a result, we observed an increase in memory usage for kube-apiserver in larger an heavily loaded clusters up to ~25% (with the benefit of API call latencies drop by up to 10x on 99th percentiles). If the memory increase is not acceptable for you you can mitigate by setting GOGC env variable (for our tests using GOGC=63 brings memory usage back to original value)
- Replace the url label of rest_client_request_duration_seconds and rest_client_rate_limiter_duration_seconds metrics with a host label to prevent cardinality explosions and keep only the useful information. This is a breaking change required for security reasons. (#106539)
- Note
- New beta APIs will not be enabled in clusters by default. Existing beta APIs and new versions of existing beta APIs, will continue to be enabled by default.
- Probes (liveness,readiness.startup) now support gRPC: https://github.com/kubernetes/enhancements/issues/2727
- The calculations for Pod topology spread skew now exclude nodes that don't match the node affinity/selector. This may lead to unschedulable pods if you previously had pods matching the spreading selector on those excluded nodes (not matching the node affinity/selector), especially when the topologyKey is not node-level. Revisit the node affinity and/or pod selector in the topology spread constraints to avoid this scenario.
- Listed in notes only as I don't think we're affected and if we are we'll find out in staging already
- Deprecated Service.Spec.LoadBalancerIP.
- The ServerSideFieldValidation feature has graduated to beta and is now enabled by default. Kubectl 1.24 and newer will use server-side validation instead of client-side validation when writing to API servers with the feature enabled.
- Add the metric container_oom_events_total to kubelet's cAdvisor metric endpoint. (#108004)
- v1.25
- Action Required
- ✅ PodSecurityPolicy is Removed, Pod Security Admission graduates to Stable
- ✅ Promoted SeccompDefault to Beta
- ✅ Promoted Local Ephemeral Storage Capacity Isolation to Stable
- ✅ Deprecated APIs: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25
- CronJob: batch/v1beta1 -> batch/v1
- EndpointSlice: discovery.k8s.io/v1beta1 -> discovery.k8s.io/v1
- Event: events.k8s.io/v1beta1 -> events.k8s.io/v1
- HorizontalPodAutoscaler: autoscaling/v2beta1 -> autoscaling/v2
- PodDisruptionBudget: policy/v1beta1 -> policy/v1
- RuntimeClass: node.k8s.io/v1beta1 -> node.k8s.io/v1
- Metrics changes:
- Renamed apiserver_watch_cache_watch_cache_initializations_total to apiserver_watch_cache_initializations_total (#109579)
- priority_level_seat_count_samples is replaced with priority_level_seat_utilization, which samples every nanosecond rather than every millisecond; the old metric conveyed utilization despite its name.
- priority_level_seat_count_watermarks is removed.
- priority_level_request_count_samples is replaced with priority_level_request_utilization, which samples every nanosecond rather than every millisecond; the old metric conveyed utilization despite its name.
- priority_level_request_count_watermarks is removed.
- read_vs_write_request_count_samples is replaced with read_vs_write_current_requests, which samples every nanosecond rather than every second; the new metric, like the old one, measures utilization when the max-in-flight filter is used and number of requests when the API Priority and Fairness filter is used.
- read_vs_write_request_count_watermarks is removed
- apiserver_dropped_requests is dropped from this release since apiserver_request_total can now be used to track dropped requests. etcd_object_counts is also removed in favor of apiserver_storage_objects. apiserver_registered_watchers is also removed in favor of apiserver_longrunning_requests
- apiserver_longrunning_gauge was removed from the codebase. Please use apiserver_longrunning_requests instead.
- Note
- Ephemeral Containers Graduate to Stable
- Support for cgroups v2 Graduates to Stable
- v1.26
- Action Required
- ✅ Deprecated APIs: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-26
- FlowSchema/PriorityLevelConfiguration: flowcontrol.apiserver.k8s.io/v1beta1 -> flowcontrol.apiserver.k8s.io/v1beta2
- HorizontalPodAutoscaler: autoscaling/v2beta2 -> autoscaling/v2
- Metrics changes:
- cronjob_job_creation_skew_duration_seconds -> job_creation_skew_duration_seconds
- job_sync_total -> job_syncs_total
- job_finished_total -> jobs_finished_total
- kubelet_kubelet_credential_provider_plugin_duration -> kubelet_credential_provider_plugin_duration
- kubelet_kubelet_credential_provider_plugin_errors -> kubelet_credential_provider_plugin_errors
- etcd_db_total_size_in_bytes -> apiserver_storage_db_total_size_in_bytes
- KubeSchedulerConfiguration v1beta3 is deprecated in v1.26 and is removed in v1.29. Please migrate KubeSchedulerConfiguration to v1.
- ✅ Deprecated APIs: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-26
- Note
- A new pod_status_sync_duration_seconds histogram is reported at alpha metrics stability that estimates how long the Kubelet takes to write a pod status change once it is detected. (#107896)
- Kube-apiserver: gzip compression switched from level 4 to level 1 to improve large list call latencies in exchange for higher network bandwidth usage (10-50% higher). This increases the headroom before very large unpaged list calls exceed request timeout limits. (#112299)
- Deprecated the following kubectl run flags, which are ignored if set: --cascade, --filename, --force, --grace-period, --kustomize, --recursive, --timeout, --wait
- v1.27
- Action Required
- ✅ Use containerRuntimeEndpoint KubeletConfiguration instead of --container-runtime-endpoint cli argument
- ✅ Kubelet: remove deprecated flag --container-runtime (#114017)
- Support for the alpha seccomp annotations seccomp.security.alpha.kubernetes.io/pod and container.seccomp.security.alpha.kubernetes.io were deprecated since v1.19, now have been completely removed. The seccomp fields are no longer auto-populated when pods with seccomp annotations are created. Pods should use the corresponding pod or container securityContext.seccompProfile field instead. (#114947)
- This is probably more of a note as I think we've migrated everything as part of T273507 - but better check
- ✅ Added a new ClusterIP allocator. The new allocator removes previous Service CIDR block size limitations for IPv4, and limits IPv6 size to a /64 (#115075)
- ✅ Graduated seccomp profile defaulting to GA.
- Set the seccompDefault kubelet configuration field to true to make pods on that node default to using the RuntimeDefault seccomp profile. https://k8s.io/docs/tutorials/security/seccomp
- ✅ kubelet: migrated --container-runtime-endpoint and --image-service-endpoint to kubelet config (#112136)
- Metrics changes:
- kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total -> kube_apiserver_pod_logs_backend_tls_failure_total
- kube_apiserver_pod_logs_pods_logs_insecure_backend_total -> kube_apiserver_pod_logs_insecure_backend_total
- node_collector_evictions_number -> node_collector_evictions_total
- scheduler_e2e_scheduling_duration_seconds -> scheduler_scheduling_attempt_duration_second
- Note
- PodSpec.Container.Resources became mutable for CPU and memory resource types.
- A new feature was enabled to improve the performance of the iptables mode of kube-proxy in large clusters. (#115138) Problems with this might be detected by seeing the value of kube-proxy's sync_proxy_rules_iptables_partial_restore_failures_total metric rising.
- Kubelet allows pods to use the net.ipv4.ip_local_reserved_ports sysctl by default and the minimal kernel version is 3.16; Pod Security admission allows this sysctl in v1.27+ versions of the baseline and restricted policies. (#115374)
- Kubelet no longer creates certain legacy iptables rules by default. It is possible that this will cause problems with some third-party components that improperly depended on those rules. If this affects you, you can run kubelet with --feature-gates=IPTablesOwnershipCleanup=false, but a bug should also be filed against the third-party component. (#114472)
- A new metric kubelet_known_pods has been added at ALPHA stability to report the number of pods a Kubelet is tracking in a number of internal states. Operators may use the metrics to track an excess of pods in the orphaned state that may not be completing. (#113145)
- The feature gates CSIInlineVolume, CSIMigration, DaemonSetUpdateSurge, EphemeralContainers, IdentifyPodOS, LocalStorageCapacityIsolation, NetworkPolicyEndPort and StatefulSetMinReadySeconds that graduated to GA in v1.25 and were unconditionally enabled have been removed in v1.27 (#114410)
- v1.28
- Action Required
- Nothing affecting wikikube, ask DSE about cheph stuff https://people.wikimedia.org/~jayme/k8s-relnotes/?kinds=deprecation&releaseVersions=1.28.0
- ✅ Deprecated APIs:
- kubescheduler.config.k8s.io/v1beta2 -> kubescheduler.config.k8s.io/v1
- Promoted API groups ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding to v1beta1
- Metrics changes:
- Apiserver adds two new metrics etcd_requests_total and etcd_request_errors_total that allow users to monitor requests to etcd storage, split by operation and resource type
- scheduler_scheduler_goroutines -> scheduler_goroutines
- apiserver_storage_db_total_size_in_bytes -> apiserver_storage_size_bytes
- apiserver_flowcontrol_request_concurrency_limit -> apiserver_flowcontrol_nominal_limit_seats
- Note
- The new feature gate "SidecarContainers" is now available. This feature introduces sidecar containers, a new type of init container that starts before other containers but remains running for the full duration of the pod's lifecycle and will not block pod termination.
- Pods which set hostNetwork: true and declare ports, get the hostPort field set automatically. Previously this would happen in the PodTemplate of a Deployment, DaemonSet or other workload API. Now hostPort will only be set when an actual Pod is being created. If this presents a problem, setting the feature gate "DefaultHostNetworkHostPortsInPodTemplates" to true will revert this behavior
- Support for proxying a request to a peer kube-apiserver if the local apiserver is not able to serve it due to version skew or in the case the requested api is disabled on the local apiserver
- Added new annotation batch.kubernetes.io/cronjob-scheduled-timestamp to Job objects scheduled from CronJobs.
- StatefulSet pods now have the pod index set as a pod label statefulset.kubernetes.io/pod-index
- The IPTablesOwnershipCleanup feature (KEP-3178) is now GA; kubelet no longer creates the KUBE-MARK-DROP chain (which has been unused for several releases) or the KUBE-MARK-MASQ chain (which is now only created by kube-proxy)
- Added a new command line argument --interactive to kubectl. The new command line argument lets a user confirm deletion requests per resource interactively
- Added DisruptionTarget condition to the pod preempted by kubelet to make room for a critical pod.
- Added podReplacementPolicy and terminating field to job api.
- kube-apiserver will now always remove its endpoint from Kubernetes service during graceful shutdown
- v1.29
- Action Required
- ✅ 'kube-scheduler component config (KubeSchedulerConfiguration) kubescheduler.config.k8s.io/v1beta3 is removed in v1.29. Migrated kube-scheduler configuration files to kubescheduler.config.k8s.io/v1.' (#119994)
- Metrics changes:
- pod_scheduling_duration_seconds -> pod_scheduling_sli_duration_seconds
- apiserver_request_body_sizes -> apiserver_request_body_size_bytes
- Note
- Added support for split image filesystem in kubelet. (#120616)
- Graduated the following kubelet resource metrics to general availability:
- container_cpu_usage_seconds_total
- container_memory_working_set_bytes
- container_start_time_seconds
- node_cpu_usage_seconds_total
- node_memory_working_set_bytes
- pod_cpu_usage_seconds_total
- pod_memory_working_set_bytes
- resource_scrape_error
- Can we stop collecting those in kube-state-metrics now?
- The SidecarContainers feature has graduated to beta and is enabled by default. (#121579)
- Sidecar termination is now serialized and each sidecar container will receive a SIGTERM after all main containers and later starting sidecar containers have terminated. (#120620)
- kube-controller-manager: The LegacyServiceAccountTokenCleanUp feature gate is now beta and enabled by default. When enabled, legacy auto-generated service account token secrets are auto-labeled with a kubernetes.io/legacy-token-invalid-since label if the credentials have not been used in the time specified by --legacy-service-account-token-clean-up-period (defaulting to one year), and are referenced from the .secrets list of a ServiceAccount object, and are not referenced from pods. This label causes the authentication layer to reject use of the credentials. After being labeled as invalid, if the time specified by --legacy-service-account-token-clean-up-period (defaulting to one year) passes without the credential being used, the secret is automatically deleted. Secrets labeled as invalid which have not been auto-deleted yet can be re-activated by removing the kubernetes.io/legacy-token-invalid-since label. (#120682)
- v1.30
- Action Required
- ✅ AppArmor profiles can now be configured through fields on the PodSecurityContext and container SecurityContext. The beta AppArmor annotations are deprecated, and AppArmor status is no longer included in the node ready condition. (#123435)
- ✅ Changed --nodeport-addresses behavior to default to "primary node IP(s) only" rather than "all node IPs". (#122724)
- Note
- ValidatingAdmissionPolicy was promoted to GA and will be enabled by default. (#123405)
- A new kubelet metric image_pull_duration_seconds was added. The metric tracks the duration (in seconds) it takes for an image to be pulled, including the time spent in the waiting queue of image puller. The metric is broken down by bucketed image size. (#121719)
- v1.31
- Action Required
- ✅ Added support to the kube-proxy nodePortAddresses / --nodeport-addresses option to accept the value "primary", meaning to only listen for NodePort connections on the node's primary IPv4 and/or IPv6 address (according to the Node object). This is strongly recommended, if you were not previously using --nodeport-addresses, to avoid surprising behavior. (This behavior is enabled by default with the nftables backend; you would need to explicitly request --nodeport-addresses 0.0.0.0/0,::/0 there to get the traditional "listen on all interfaces" behavior.) (#123105)
- ✅ Graduated Kubernetes' support for AppArmor to GA. You now cannot disable the AppArmor feature gate. (#125257)
- Note
- Introduced a new boolean kubelet flag --fail-cgroupv1. (#126031)
- Added a warning log, an event for cgroup v1 usage and a metric for cgroup version. (#125328)
- Promoted CRI communication of the cgroup driver mechanism to beta. The KubeletCgroupDriverFromCRI feature gate is now in beta and enabled by default. This allows the kubelet to query the container runtime using CRI to determine the mechanism for cgroup management. If the container runtime doesn't support this, the kubelet falls back to using the configuration file (you can also use the deprecated --cgroup-driver command line argument). (#125828)
- Seems not supported by containerd rn (>=2.0 only): https://github.com/containerd/containerd/blob/main/docs/cri/config.md#cgroup-driver
- Kube-proxy's nftables mode (--proxy-mode=nftables) is now beta and available by default. (#124383)
- "Removed the ability to run kubectl exec [POD] [COMMAND] without a -- separator. The -- separator has been recommended since the Kubernetes v1.18 release, which also deprecated the legacy way of invoking kubectl exec.
Upgrade process
- Package Kubernetes
- Package Calico / update helm chart
- Re-initialize wikikube-staging-codfw
- Re-initialize wikikube-staging-eqiad
- Update grafana dashboards and alerts (to find dashboards using a specific metric, see https://wikitech.wikimedia.org/wiki/Grafana#Search/audit_metrics_usage_across_dashboards)
- Re-initialize wikikube-codfw
- Re-initialize wikikube-eqiad