Page MenuHomePhabricator

Update Kubernetes clusters to 1.31
Open, In Progress, HighPublic

Description

Umbrella task to track the work required towards upgrading our Kubernetes clusters to Kubernetes 1.31.

We're currently running 1.23 which went EOL on 2023-02-08 and there are some bigger requirement to be dealt with before moving to a newer version:

  • Migrate away from docker ad container runtime: T269684
  • We need to migrate away from PodSecurityPolicies: T273507

Together with the Kubernetes update, we need to update the following other components:
Core components (more or less on all clusters):

  • calico (currenty 3.23.3)
    • https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements
    • 3.29 supports k8s: 1.29 -> 1.31
    • 3.23.x
      • nothing relevant
    • 3.24.x
      • Calico no longer installs pod security policies (deprecated in Kubernetes 1.21) and now deploys pod security standards.
      • Tolerate node-role.kubernetes.io/control-plane taints #6370
      • calicoctl ipam check/release now look for and clean up unused IPAM handles. #6155
    • 3.25.x
      • Typha now supports graceful shutdown (terminationGracePeriodSeconds)
      • Typha now supports compression on its protocol; this gives a 5:1 reduction in bandwidth use
      • Typha now shares computed (and compressed) snapshots between clients that connect at approximately the same time. This significantly reduces CPU usage and the time to service all clients when many clients connect at once.
      • Fix that calicoctl ipam release could only release IPAM handles when running in etcd mode. #6650
      • Many of Typha's Prometheus metrics are now split by syncer (client) type, represented by a label "syncer" on the metrics. This prevents cross-talk where the syncers would all share the same metrics and the last writer to the metric would "win". #6675
    • 3.26.x
      • Separate calico-node and calico-cni-plugin service accounts #7106
    • 3.27.x
      • Update Typha Deployment tolerations to helm charts so that it can be scheduled on any node. #7979
    • 3.28.x
      • The Calico CNI plugin can be configured to prevent new pods from starting their containers until the pod's policy has finished being programmed
      • IPv6 support for eBPF mode is generally available
      • We made it easier to migrate your cluster from iptables to eBPF data plan modes. Previously, having a mix of nodes in iptables and eBPF modes could cause a breakdown in cluster networking. Now, the operator makes the transition with minimal disruption.
      • Update the Grafana dashboard for Typha. Tested with Grafana v10.4.0. #8613
      • Improve IPAM block garbage collection behavior for IP pools with small blocks. #8454
      • Helm chart now supports specifying priorityClassName in values.yaml #8427
      • Ability to set FelixConfiguration via helm chart #8559
      • The calico-node, calico-kube-controllers and calico-typha pods now run with securityContext.seccompProfile.type=RuntimeDefault. #6524
    • 3.29.x
      • Tiered network policy and support for the AdminNetworkPolicy resource
      • This release includes support for the nftables dataplane.
      • In manifest installs, in order to prevent default IP-pools creation, CALICO_IPV4POOL_CIDR=none and CALICO_IPV6POOL_CIDR=none environment variable special values are now supported. #8156
      • The calico-kube-controllers container now runs with securityContext.runAsNonRoot=true # 36499
      • Felix's route resync logic has been optimised; it now uses 50% less CPU time and 80% less memory. #9139
      • Expose the Go runtime's "GOMAXPROCS" setting via felix configuration. This may be useful for tuning Felix to take account of CPU limits. #8945
      • Felix now sets the Go runtime's GC threshold to 40% (instead of the more aggressive 20% used previously). This trades slight extra RAM usage for significantly lower GC CPU usage. The setting is now exposed in the FelixConfiguration as goGCThreshold, along with goMemoryLimitMB. To get the old behaviour, set goGCThreshold to 20. If memory usage is not a concern, the value can be set even higher to reduce CPU usage. #8904
  • Istio (currenty 1.15.7)
  • cert-manager (currenty 1.10.1)
  • coredns (currenty 1.8.7)
  • helm (currently 3.11.3)
    • https://helm.sh/docs/topics/version_skew/#supported-version-skew
    • 3.16.x supports k8s: 1.28 -> 1.31
    • ❗We probably want to run two different helm versions. We need to continue using 3.11 for deployments to k8s 1.23 clusters and helm 3.16 for k8s 1.31
    • 3.12
      • Action required: none
      • Note:
        • When charts are pushed to OCI registries, annotations are attached using standard names that other tools can introspect (e.g., version).
        • --set-literal command-line flag to set a specific string with no escaping.
        • --cascade flag to specify the deletion policy on uninstall.
    • 3.13
      • Action required: none
      • Note:
        • The --dry-run flag now has multiple options, enabling Helm to connect to a Kubernetes instance. The default behavior, when --dry-run is used, is unchanged.
        • Values handling had numerous issues fixed and now consistently follows this priority:
          1. User-specified values (e.g., CLI).
          2. Parent chart values.
          3. Imported values.
      • Subchart values.
        • Additionally, null can now consistently be used to remove values. Note: there is a regression in 3.13.0 that's fixed in 3.13.1.
        • Helm now adds the OCI creation annotation.
        • New helm get metadata command.
        • Added labels support for install and upgrade commands.
    • 3.14
      • Action required: none
      • Note:
        • New helm search flag: --fail-on-no-result.
        • Allows a nested tpl invocation access to defines.
        • Added qps/HELM_QPS parameter for Kubernetes rate limiting.
        • Added --kube-version to the lint command.
    • 3.15 AppArmor profiles can now be configured through fields on the PodSecurityContext and container SecurityContext. The beta AppArmor annotations are deprecated, and AppArmor status is no longer included in the node ready condition. (#123435)
      • Action required: none
      • Note:
        • Opt-in to hiding secrets when running dry-run for install and upgrade.
    • 3.16
      • Action required: none
      • Note:
        • Added sha512sum template function.
        • Added --skip-schema-validation flag to helm install, upgrade, and lint.
  • kube-state-metrics (currenty 2.10.0)
    • https://github.com/kubernetes/kube-state-metrics?tab=readme-ov-file#compatibility-matrix
    • v2.14.0 supports k8s: v1.31
    • 2.11.0
      • action required: none
      • note: This release builds with Golang v1.21.8.
    • 2.12.0
      • action required: none
      • note: This release builds with k8s.io/client-go: v0.29.3.
    • 2.13.0
      • action required: none
      • note:
        • This release builds with Golang v1.22.5.
        • This release builds with k8s.io/client-go: v0.30.3.
        • This release adds read and write timeouts for requests. The defaults might have an impact on scrapes that take a long time.
    • 2.14.0
      • action required:
        • check if we are using kube_endpoint_address_not_ready and kube_endpoint_address_available metrics and preplace them
      • note:
        • This release builds with Golang v1.23.3
        • This release builds with k8s.io/client-go: v0.31.2
        • This release removes kube_endpoint_address_not_ready and kube_endpoint_address_available which have been deprecated in 2022. Please use kube_endpoint_addressas a replacement.

Operators/Addons (only on specific clusters):

Preparation for the Kubernetes update

  • Ensure all our charts are compatible with the new Kubernetes version (currently validating against 1.27) T379919
  • Read Kubernetes changelogs (yellow/red flags just linked below each version. Tick the box if all action required items have been addressed, use ✅ for single items)
  • v1.23.15-1.23.17
  • Action Required
  • Note
    • kube-apiserver defaults the GOGC setting to 63, to approximate go1.17 garbage collection memory performance in heavily loaded API servers
  • v1.24
  • Action Required
    • ✅ Docker runtime support using dockershim in the kubelet is now completely removed
    • Artifacts are now signed and can be verified in our package build process: https://kubernetes.io/docs/tasks/administer-cluster/verify-signed-artifacts/
    • The LegacyServiceAccountTokenNoAutoGeneration feature gate is beta, and enabled by default. Secret API objects containing service account tokens are no longer auto-generated for every ServiceAccount.
    • Remove any use of --experimental-check-node-capabilities-before-mount from your kubelet scripts or manifests.
    • ✅ The --pod-infra-container-image kubelet flag is deprecated and will be removed in future releases.
      • We can safely remove this for clusters on containerd
    • Renamed metrics evictions_number to evictions_total and mark it as stable. The original evictions_number metrics name is marked as "Deprecated" and has been removed in kubernetes 1.23 . (#106366)
    • ✅ Kubelet: the following dockershim related flags are also removed along with dockershim --experimental-dockershim-root-directory, --docker-endpoint, --image-pull-progress-deadline, --network-plugin, --cni-conf-dir, --cni-bin-dir, --cni-cache-dir, --network-plugin-mtu. (#106907)
    • ✅ Kubernetes 1.24 bumped version of golang it is compiled with to go1.18, which introduced significant changes to its garbage collection algorithm. As a result, we observed an increase in memory usage for kube-apiserver in larger an heavily loaded clusters up to ~25% (with the benefit of API call latencies drop by up to 10x on 99th percentiles). If the memory increase is not acceptable for you you can mitigate by setting GOGC env variable (for our tests using GOGC=63 brings memory usage back to original value)
    • Replace the url label of rest_client_request_duration_seconds and rest_client_rate_limiter_duration_seconds metrics with a host label to prevent cardinality explosions and keep only the useful information. This is a breaking change required for security reasons. (#106539)
  • Note
    • New beta APIs will not be enabled in clusters by default. Existing beta APIs and new versions of existing beta APIs, will continue to be enabled by default.
    • Probes (liveness,readiness.startup) now support gRPC: https://github.com/kubernetes/enhancements/issues/2727
    • The calculations for Pod topology spread skew now exclude nodes that don't match the node affinity/selector. This may lead to unschedulable pods if you previously had pods matching the spreading selector on those excluded nodes (not matching the node affinity/selector), especially when the topologyKey is not node-level. Revisit the node affinity and/or pod selector in the topology spread constraints to avoid this scenario.
      • Listed in notes only as I don't think we're affected and if we are we'll find out in staging already
    • Deprecated Service.Spec.LoadBalancerIP.
    • The ServerSideFieldValidation feature has graduated to beta and is now enabled by default. Kubectl 1.24 and newer will use server-side validation instead of client-side validation when writing to API servers with the feature enabled.
    • Add the metric container_oom_events_total to kubelet's cAdvisor metric endpoint. (#108004)
  • v1.25
  • Action Required
    • ✅ PodSecurityPolicy is Removed, Pod Security Admission graduates to Stable
    • ✅ Promoted SeccompDefault to Beta
    • ✅ Promoted Local Ephemeral Storage Capacity Isolation to Stable
    • ✅ Deprecated APIs: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25
      • CronJob: batch/v1beta1 -> batch/v1
      • EndpointSlice: discovery.k8s.io/v1beta1 -> discovery.k8s.io/v1
      • Event: events.k8s.io/v1beta1 -> events.k8s.io/v1
      • HorizontalPodAutoscaler: autoscaling/v2beta1 -> autoscaling/v2
      • PodDisruptionBudget: policy/v1beta1 -> policy/v1
      • RuntimeClass: node.k8s.io/v1beta1 -> node.k8s.io/v1
    • Metrics changes:
      • Renamed apiserver_watch_cache_watch_cache_initializations_total to apiserver_watch_cache_initializations_total (#109579)
      • priority_level_seat_count_samples is replaced with priority_level_seat_utilization, which samples every nanosecond rather than every millisecond; the old metric conveyed utilization despite its name.
      • priority_level_seat_count_watermarks is removed.
      • priority_level_request_count_samples is replaced with priority_level_request_utilization, which samples every nanosecond rather than every millisecond; the old metric conveyed utilization despite its name.
      • priority_level_request_count_watermarks is removed.
      • read_vs_write_request_count_samples is replaced with read_vs_write_current_requests, which samples every nanosecond rather than every second; the new metric, like the old one, measures utilization when the max-in-flight filter is used and number of requests when the API Priority and Fairness filter is used.
      • read_vs_write_request_count_watermarks is removed
      • apiserver_dropped_requests is dropped from this release since apiserver_request_total can now be used to track dropped requests. etcd_object_counts is also removed in favor of apiserver_storage_objects. apiserver_registered_watchers is also removed in favor of apiserver_longrunning_requests
      • apiserver_longrunning_gauge was removed from the codebase. Please use apiserver_longrunning_requests instead.
  • Note
    • Ephemeral Containers Graduate to Stable
    • Support for cgroups v2 Graduates to Stable
  • v1.26
  • Action Required
    • ✅ Deprecated APIs: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-26
      • FlowSchema/PriorityLevelConfiguration: flowcontrol.apiserver.k8s.io/v1beta1 -> flowcontrol.apiserver.k8s.io/v1beta2
      • HorizontalPodAutoscaler: autoscaling/v2beta2 -> autoscaling/v2
    • Metrics changes:
      • cronjob_job_creation_skew_duration_seconds -> job_creation_skew_duration_seconds
      • job_sync_total -> job_syncs_total
      • job_finished_total -> jobs_finished_total
      • kubelet_kubelet_credential_provider_plugin_duration -> kubelet_credential_provider_plugin_duration
      • kubelet_kubelet_credential_provider_plugin_errors -> kubelet_credential_provider_plugin_errors
      • etcd_db_total_size_in_bytes -> apiserver_storage_db_total_size_in_bytes
    • KubeSchedulerConfiguration v1beta3 is deprecated in v1.26 and is removed in v1.29. Please migrate KubeSchedulerConfiguration to v1.
  • Note
    • A new pod_status_sync_duration_seconds histogram is reported at alpha metrics stability that estimates how long the Kubelet takes to write a pod status change once it is detected. (#107896)
    • Kube-apiserver: gzip compression switched from level 4 to level 1 to improve large list call latencies in exchange for higher network bandwidth usage (10-50% higher). This increases the headroom before very large unpaged list calls exceed request timeout limits. (#112299)
    • Deprecated the following kubectl run flags, which are ignored if set: --cascade, --filename, --force, --grace-period, --kustomize, --recursive, --timeout, --wait
  • v1.27
  • Action Required
    • ✅ Use containerRuntimeEndpoint KubeletConfiguration instead of --container-runtime-endpoint cli argument
    • ✅ Kubelet: remove deprecated flag --container-runtime (#114017)
    • Support for the alpha seccomp annotations seccomp.security.alpha.kubernetes.io/pod and container.seccomp.security.alpha.kubernetes.io were deprecated since v1.19, now have been completely removed. The seccomp fields are no longer auto-populated when pods with seccomp annotations are created. Pods should use the corresponding pod or container securityContext.seccompProfile field instead. (#114947)
      • This is probably more of a note as I think we've migrated everything as part of T273507 - but better check
    • ✅ Added a new ClusterIP allocator. The new allocator removes previous Service CIDR block size limitations for IPv4, and limits IPv6 size to a /64 (#115075)
    • ✅ Graduated seccomp profile defaulting to GA.
    • ✅ kubelet: migrated --container-runtime-endpoint and --image-service-endpoint to kubelet config (#112136)
    • Metrics changes:
      • kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total -> kube_apiserver_pod_logs_backend_tls_failure_total
      • kube_apiserver_pod_logs_pods_logs_insecure_backend_total -> kube_apiserver_pod_logs_insecure_backend_total
      • node_collector_evictions_number -> node_collector_evictions_total
      • scheduler_e2e_scheduling_duration_seconds -> scheduler_scheduling_attempt_duration_second
  • Note
    • PodSpec.Container.Resources became mutable for CPU and memory resource types.
    • A new feature was enabled to improve the performance of the iptables mode of kube-proxy in large clusters. (#115138) Problems with this might be detected by seeing the value of kube-proxy's sync_proxy_rules_iptables_partial_restore_failures_total metric rising.
    • Kubelet allows pods to use the net.ipv4.ip_local_reserved_ports sysctl by default and the minimal kernel version is 3.16; Pod Security admission allows this sysctl in v1.27+ versions of the baseline and restricted policies. (#115374)
    • Kubelet no longer creates certain legacy iptables rules by default. It is possible that this will cause problems with some third-party components that improperly depended on those rules. If this affects you, you can run kubelet with --feature-gates=IPTablesOwnershipCleanup=false, but a bug should also be filed against the third-party component. (#114472)
    • A new metric kubelet_known_pods has been added at ALPHA stability to report the number of pods a Kubelet is tracking in a number of internal states. Operators may use the metrics to track an excess of pods in the orphaned state that may not be completing. (#113145)
    • The feature gates CSIInlineVolume, CSIMigration, DaemonSetUpdateSurge, EphemeralContainers, IdentifyPodOS, LocalStorageCapacityIsolation, NetworkPolicyEndPort and StatefulSetMinReadySeconds that graduated to GA in v1.25 and were unconditionally enabled have been removed in v1.27 (#114410)
  • v1.28
  • Action Required
    • Nothing affecting wikikube, ask DSE about cheph stuff https://people.wikimedia.org/~jayme/k8s-relnotes/?kinds=deprecation&releaseVersions=1.28.0
    • ✅ Deprecated APIs:
      • kubescheduler.config.k8s.io/v1beta2 -> kubescheduler.config.k8s.io/v1
      • Promoted API groups ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding to v1beta1
    • Metrics changes:
      • Apiserver adds two new metrics etcd_requests_total and etcd_request_errors_total that allow users to monitor requests to etcd storage, split by operation and resource type
      • scheduler_scheduler_goroutines -> scheduler_goroutines
      • apiserver_storage_db_total_size_in_bytes -> apiserver_storage_size_bytes
      • apiserver_flowcontrol_request_concurrency_limit -> apiserver_flowcontrol_nominal_limit_seats
  • Note
    • The new feature gate "SidecarContainers" is now available. This feature introduces sidecar containers, a new type of init container that starts before other containers but remains running for the full duration of the pod's lifecycle and will not block pod termination.
    • Pods which set hostNetwork: true and declare ports, get the hostPort field set automatically. Previously this would happen in the PodTemplate of a Deployment, DaemonSet or other workload API. Now hostPort will only be set when an actual Pod is being created. If this presents a problem, setting the feature gate "DefaultHostNetworkHostPortsInPodTemplates" to true will revert this behavior
    • Support for proxying a request to a peer kube-apiserver if the local apiserver is not able to serve it due to version skew or in the case the requested api is disabled on the local apiserver
    • Added new annotation batch.kubernetes.io/cronjob-scheduled-timestamp to Job objects scheduled from CronJobs.
    • StatefulSet pods now have the pod index set as a pod label statefulset.kubernetes.io/pod-index
    • The IPTablesOwnershipCleanup feature (KEP-3178) is now GA; kubelet no longer creates the KUBE-MARK-DROP chain (which has been unused for several releases) or the KUBE-MARK-MASQ chain (which is now only created by kube-proxy)
    • Added a new command line argument --interactive to kubectl. The new command line argument lets a user confirm deletion requests per resource interactively
    • Added DisruptionTarget condition to the pod preempted by kubelet to make room for a critical pod.
    • Added podReplacementPolicy and terminating field to job api.
    • kube-apiserver will now always remove its endpoint from Kubernetes service during graceful shutdown
  • v1.29
  • Action Required
    • ✅ 'kube-scheduler component config (KubeSchedulerConfiguration) kubescheduler.config.k8s.io/v1beta3 is removed in v1.29. Migrated kube-scheduler configuration files to kubescheduler.config.k8s.io/v1.' (#119994)
    • Metrics changes:
      • pod_scheduling_duration_seconds -> pod_scheduling_sli_duration_seconds
      • apiserver_request_body_sizes -> apiserver_request_body_size_bytes
  • Note
    • Added support for split image filesystem in kubelet. (#120616)
    • Graduated the following kubelet resource metrics to general availability:
      • container_cpu_usage_seconds_total
      • container_memory_working_set_bytes
      • container_start_time_seconds
      • node_cpu_usage_seconds_total
      • node_memory_working_set_bytes
      • pod_cpu_usage_seconds_total
      • pod_memory_working_set_bytes
      • resource_scrape_error
      • Can we stop collecting those in kube-state-metrics now?
    • The SidecarContainers feature has graduated to beta and is enabled by default. (#121579)
    • Sidecar termination is now serialized and each sidecar container will receive a SIGTERM after all main containers and later starting sidecar containers have terminated. (#120620)
    • kube-controller-manager: The LegacyServiceAccountTokenCleanUp feature gate is now beta and enabled by default. When enabled, legacy auto-generated service account token secrets are auto-labeled with a kubernetes.io/legacy-token-invalid-since label if the credentials have not been used in the time specified by --legacy-service-account-token-clean-up-period (defaulting to one year), and are referenced from the .secrets list of a ServiceAccount object, and are not referenced from pods. This label causes the authentication layer to reject use of the credentials. After being labeled as invalid, if the time specified by --legacy-service-account-token-clean-up-period (defaulting to one year) passes without the credential being used, the secret is automatically deleted. Secrets labeled as invalid which have not been auto-deleted yet can be re-activated by removing the kubernetes.io/legacy-token-invalid-since label. (#120682)
  • v1.30
  • Action Required
  • Note
    • ValidatingAdmissionPolicy was promoted to GA and will be enabled by default. (#123405)
    • A new kubelet metric image_pull_duration_seconds was added. The metric tracks the duration (in seconds) it takes for an image to be pulled, including the time spent in the waiting queue of image puller. The metric is broken down by bucketed image size. (#121719)
  • v1.31
  • Action Required
    • ✅ Added support to the kube-proxy nodePortAddresses / --nodeport-addresses option to accept the value "primary", meaning to only listen for NodePort connections on the node's primary IPv4 and/or IPv6 address (according to the Node object). This is strongly recommended, if you were not previously using --nodeport-addresses, to avoid surprising behavior. (This behavior is enabled by default with the nftables backend; you would need to explicitly request --nodeport-addresses 0.0.0.0/0,::/0 there to get the traditional "listen on all interfaces" behavior.) (#123105)
    • ✅ Graduated Kubernetes' support for AppArmor to GA. You now cannot disable the AppArmor feature gate. (#125257)
  • Note
    • Introduced a new boolean kubelet flag --fail-cgroupv1. (#126031)
    • Added a warning log, an event for cgroup v1 usage and a metric for cgroup version. (#125328)
    • Promoted CRI communication of the cgroup driver mechanism to beta. The KubeletCgroupDriverFromCRI feature gate is now in beta and enabled by default. This allows the kubelet to query the container runtime using CRI to determine the mechanism for cgroup management. If the container runtime doesn't support this, the kubelet falls back to using the configuration file (you can also use the deprecated --cgroup-driver command line argument). (#125828)
    • Kube-proxy's nftables mode (--proxy-mode=nftables) is now beta and available by default. (#124383)
    • "Removed the ability to run kubectl exec [POD] [COMMAND] without a -- separator. The -- separator has been recommended since the Kubernetes v1.18 release, which also deprecated the legacy way of invoking kubectl exec.

Upgrade process

Details

SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+839 -1 K
operations/deployment-chartsmaster+1 -5
operations/cookbooksmaster+185 -91
operations/debs/helm3master+12 -6
operations/puppetproduction+6 -23
operations/puppetproduction+23 -6
operations/puppetproduction+5 -0
operations/puppetproduction+6 -5
operations/puppetproduction+7 -0
operations/puppetproduction+4 -2
operations/puppetproduction+63 -18
operations/deployment-chartsmaster+1 -3
operations/deployment-chartsmaster+115 -0
operations/deployment-chartsmaster+28 -4
operations/debs/helm3master+21 -7
operations/debs/istioctlmaster+2 -2
operations/debs/istioctlmaster+20 -5
operations/docker-images/production-imagesmaster+85 -26
operations/docker-images/production-imagesmaster+41 -506
operations/docker-images/production-imagesmaster+15 -5
operations/deployment-chartsmaster+123 -31
operations/deployment-chartsmaster+2 -1
operations/deployment-chartsmaster+2 -1
operations/deployment-chartsmaster+12 -0
operations/deployment-chartsmaster+214 -89
operations/deployment-chartsmaster+1 K -116
operations/deployment-chartsmaster+3 -1
operations/cookbooksmaster+1 -1
operations/puppetproduction+10 -2
operations/debs/kubernetesv1.31+13 -8
operations/debs/calicov3.29+25 -9
operations/debs/kubernetesv1.23+132 -10
operations/puppetproduction+2 -0
operations/cookbooksmaster+25 -17
operations/cookbooksmaster+102 -77
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedJMeybohm
In ProgressNone
StalledJMeybohm
ResolvedJMeybohm
Resolvedklausman
ResolvedJMeybohm
ResolvedCDanis
Resolvedbrouberol
Openklausman
OpenJMeybohm
OpenJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
Resolvedkamila
Openkamila
Openkamila
Openkamila
OpenStevemunene
Resolvedkamila
DeclinedVRiley-WMF
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedNone
ResolvedVRiley-WMF
ResolvedJclark-ctr
ResolvedJclark-ctr
ResolvedJclark-ctr
ResolvedJclark-ctr
Resolvedakosiaris
ResolvedJclark-ctr
OpenNone
ResolvedNone
ResolvedJelto
ResolvedJMeybohm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
Resolvedelukey
ResolvedRequestJhancock.wm
ResolvedNone
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
DuplicateJelto
Resolvedelukey
Resolvedayounsi
Resolvedelukey
ResolvedJMeybohm
ResolvedClement_Goubert
OpenNone
ResolvedJMeybohm
ResolvedJMeybohm
OpenNone
OpenNone
Resolvedkamila
DuplicateNone
DuplicateNone
ResolvedRequestJclark-ctr
ResolvedVRiley-WMF
DeclinedNone
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJMeybohm
Resolvedhnowlan
DuplicateClement_Goubert
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedJMeybohm
ResolvedRobH
ResolvedRequestPapaul
ResolvedRequestVRiley-WMF
OpenNone
OpenNone
ResolvedJelto
ResolvedJMeybohm
Openklausman
Openklausman
StalledNone
ResolvedJMeybohm
OpenNone
OpenNone
In ProgressJMeybohm
OpenClement_Goubert
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1109379 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] aptrepo: Add bookworm components calico329 and kubernetes131

https://gerrit.wikimedia.org/r/1109379

Change #1109379 merged by JMeybohm:

[operations/puppet@production] aptrepo: Add bookworm components calico329 and kubernetes131

https://gerrit.wikimedia.org/r/1109379

Change #1109458 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/debs/kubernetes@v1.23] Support multiple kubernetes-client versions

https://gerrit.wikimedia.org/r/1109458

Change #1109671 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/debs/calico@v3.29] Update to calico v3.29.1

https://gerrit.wikimedia.org/r/1109671

Change #1109672 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/debs/kubernetes@v1.31] Update to kubernetes v1.31.4

https://gerrit.wikimedia.org/r/1109672

Change #1109704 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s::package: Install version specific kubernetes-client package

https://gerrit.wikimedia.org/r/1109704

Change #1110813 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Update staging-codfw to kubernetes 1.31, calico 3.29

https://gerrit.wikimedia.org/r/1110813

Change #1109458 merged by JMeybohm:

[operations/debs/kubernetes@v1.23] Support multiple kubernetes-client versions

https://gerrit.wikimedia.org/r/1109458

Change #1109671 merged by JMeybohm:

[operations/debs/calico@v3.29] Update to calico v3.29.1

https://gerrit.wikimedia.org/r/1109671

Mentioned in SAL (#wikimedia-operations) [2025-01-14T13:44:41Z] <jayme> imported kubernetes 1.23.14-5 to bullseye/bookworm-wikimedia - T341984

Mentioned in SAL (#wikimedia-operations) [2025-01-14T13:50:48Z] <jayme> imported calico 3.29.1-1 to bookworm-wikimedia - T341984

Change #1109672 merged by JMeybohm:

[operations/debs/kubernetes@v1.31] Update to kubernetes v1.31.4

https://gerrit.wikimedia.org/r/1109672

Mentioned in SAL (#wikimedia-operations) [2025-01-14T13:57:44Z] <jayme> imported kubernetes 1.31.4-1 to bookworm-wikimedia - T341984

Mentioned in SAL (#wikimedia-operations) [2025-01-15T09:54:45Z] <jayme> disabling puppet on 543 nodes using k8s::package resource - T341984

Change #1109704 merged by JMeybohm:

[operations/puppet@production] k8s::package: Install version specific kubernetes-client package

https://gerrit.wikimedia.org/r/1109704

Mentioned in SAL (#wikimedia-operations) [2025-01-15T10:08:02Z] <jayme> re-enabling puppet on nodes using k8s::package resource - T341984

Change #1111588 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/cookbooks@master] sre.k8s.renumber-node: change default os to bookworm

https://gerrit.wikimedia.org/r/1111588

Change #1111935 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Pin calico version on all clusters

https://gerrit.wikimedia.org/r/1111935

Change #1111943 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update calico-crds to calico v3.29.1

https://gerrit.wikimedia.org/r/1111943

Change #1111588 merged by jenkins-bot:

[operations/cookbooks@master] sre.k8s.renumber-node: change default os to bookworm

https://gerrit.wikimedia.org/r/1111588

Change #1112058 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update calico to v3.29.1

https://gerrit.wikimedia.org/r/1112058

Change #1112059 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update staging-codfw to k8s 1.31, calico 3.29

https://gerrit.wikimedia.org/r/1112059

Change #1112183 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] admin_ng: Install VAPs instead of PSPs on k8s >= 1.24

https://gerrit.wikimedia.org/r/1112183

Change #1111935 merged by jenkins-bot:

[operations/deployment-charts@master] Pin calico version on all clusters

https://gerrit.wikimedia.org/r/1111935

Change #1111943 merged by jenkins-bot:

[operations/deployment-charts@master] Update calico-crds to calico v3.29.1

https://gerrit.wikimedia.org/r/1111943

Change #1112058 merged by jenkins-bot:

[operations/deployment-charts@master] Update calico to v3.29.1

https://gerrit.wikimedia.org/r/1112058

Change #1112183 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Install VAPs instead of PSPs on k8s >= 1.24

https://gerrit.wikimedia.org/r/1112183

JMeybohm renamed this task from Update Kubernetes clusters to >1.25 to Update Kubernetes clusters to 1.31.Wed, Jan 22, 9:20 AM
JMeybohm updated the task description. (Show Details)

Change #1113445 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/docker-images/production-images@master] Update coredns to 1.11.3

https://gerrit.wikimedia.org/r/1113445

Change #1113453 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Pin coredns version on all clustes to 0.3.4

https://gerrit.wikimedia.org/r/1113453

Change #1113454 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update coredns to 1.11.3 / coredns helm chart 1.37.3

https://gerrit.wikimedia.org/r/1113454

Change #1113460 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/debs/istioctl@master] Import upstream release 1.24.2

https://gerrit.wikimedia.org/r/1113460

Change #1113473 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Create a copy of the wikikube istio config

https://gerrit.wikimedia.org/r/1113473

Change #1113474 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update wikikube istio 1.24.2 config

https://gerrit.wikimedia.org/r/1113474

Change #1113507 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/docker-images/production-images@master] Update istio to 1.24.2

https://gerrit.wikimedia.org/r/1113507

jijiki changed the task status from Open to In Progress.Wed, Jan 22, 5:30 PM
jijiki moved this task from ⎈Kubernetes to 🗄 Projects on the serviceops board.

Change #1113453 merged by jenkins-bot:

[operations/deployment-charts@master] Pin coredns version on all clustes to 0.3.4

https://gerrit.wikimedia.org/r/1113453

Change #1113752 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/docker-images/production-images@master] Update cert-manager to 1.16.3

https://gerrit.wikimedia.org/r/1113752

Change #1113800 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Pin cert-manager version on all clustes to 1.10.6

https://gerrit.wikimedia.org/r/1113800

Change #1113800 merged by jenkins-bot:

[operations/deployment-charts@master] Pin cert-manager version on all clustes to 1.10.6

https://gerrit.wikimedia.org/r/1113800

Change #1113454 merged by jenkins-bot:

[operations/deployment-charts@master] Update coredns to 1.11.3 / coredns helm chart 1.37.3

https://gerrit.wikimedia.org/r/1113454

Change #1113507 merged by JMeybohm:

[operations/docker-images/production-images@master] Update istio to 1.24.2

https://gerrit.wikimedia.org/r/1113507

Change #1113445 merged by JMeybohm:

[operations/docker-images/production-images@master] Update coredns to 1.11.3

https://gerrit.wikimedia.org/r/1113445

Change #1113752 merged by JMeybohm:

[operations/docker-images/production-images@master] Update cert-manager to 1.16.3

https://gerrit.wikimedia.org/r/1113752

Change #1113460 merged by JMeybohm:

[operations/debs/istioctl@master] Import upstream release 1.24.2

https://gerrit.wikimedia.org/r/1113460

Change #1114008 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/debs/istioctl@master] Add bash-completion to Build-Depends

https://gerrit.wikimedia.org/r/1114008

Change #1114008 merged by JMeybohm:

[operations/debs/istioctl@master] Add bash-completion to Build-Depends

https://gerrit.wikimedia.org/r/1114008

Mentioned in SAL (#wikimedia-operations) [2025-01-24T16:26:28Z] <jayme> imported istioctl 1.24.2-1 to bullseye/bookworm-wikimedia T341984

Change #1114666 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/helm3@master] Support multiple helm versions

https://gerrit.wikimedia.org/r/1114666

Change #1114666 merged by Jelto:

[operations/debs/helm3@master] Support multiple helm versions

https://gerrit.wikimedia.org/r/1114666

Mentioned in SAL (#wikimedia-operations) [2025-01-28T14:21:08Z] <jelto> Imported helm311 | 3.11.3-3 to bookworm-wikimedia - T341984

Mentioned in SAL (#wikimedia-operations) [2025-01-28T14:29:14Z] <jelto> Imported helm311 | 3.11.3-3 to bullseye-wikimedia - T341984

Change #1114970 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Allow to install multiple kubectl versions

https://gerrit.wikimedia.org/r/1114970

Change #1115380 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/cookbooks@master] k8s.wipe-cluster: Allow to specify downtime length

https://gerrit.wikimedia.org/r/1115380

Change #1112059 merged by JMeybohm:

[operations/deployment-charts@master] Update staging-codfw to k8s 1.31

https://gerrit.wikimedia.org/r/1112059

Change #1113473 merged by JMeybohm:

[operations/deployment-charts@master] Create a copy of the wikikube istio config

https://gerrit.wikimedia.org/r/1113473

Change #1113474 merged by JMeybohm:

[operations/deployment-charts@master] Update wikikube istio 1.24.2 config

https://gerrit.wikimedia.org/r/1113474

Change #1110813 merged by JMeybohm:

[operations/puppet@production] Update staging-codfw to kubernetes 1.31, calico 3.29

https://gerrit.wikimedia.org/r/1110813

Change #1115388 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/helm3@master] Build helm3.17 with new upstream version

https://gerrit.wikimedia.org/r/1115388

Change #1115393 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Explicitely cast string to integer

https://gerrit.wikimedia.org/r/1115393

Change #1115393 merged by JMeybohm:

[operations/puppet@production] Explicitely cast string to integer

https://gerrit.wikimedia.org/r/1115393

Change #1115433 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes-publish-sa-cert: Don't fail when no certs in etcd

https://gerrit.wikimedia.org/r/1115433

Change #1115459 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] wikikube-staging-codfw: Disable PodSecurityPolicies

https://gerrit.wikimedia.org/r/1115459

Change #1115459 merged by JMeybohm:

[operations/puppet@production] wikikube-staging-codfw: Disable PodSecurityPolicies

https://gerrit.wikimedia.org/r/1115459

Change #1115778 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes::master: Update kubectl alternative entry

https://gerrit.wikimedia.org/r/1115778

Change #1114970 merged by JMeybohm:

[operations/puppet@production] k8s::client: Allow for install of all kubectl versions

https://gerrit.wikimedia.org/r/1114970

Change #1115778 merged by JMeybohm:

[operations/puppet@production] kubernetes::master: Update kubectl alternative entry

https://gerrit.wikimedia.org/r/1115778

Change #1115433 merged by JMeybohm:

[operations/puppet@production] kubernetes-publish-sa-cert: Don't fail when no certs in etcd

https://gerrit.wikimedia.org/r/1115433

Change #1115799 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Revert "k8s::client: Allow for install of all kubectl versions"

https://gerrit.wikimedia.org/r/1115799

Change #1115799 merged by JMeybohm:

[operations/puppet@production] Revert "k8s::client: Allow for install of all kubectl versions"

https://gerrit.wikimedia.org/r/1115799

Change #1120193 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] cert-manager: Allow prometheus to scrape all components

https://gerrit.wikimedia.org/r/1120193

Change #1120628 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update for k8s >=1.30

https://gerrit.wikimedia.org/r/1120628