| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| kyverno.copy_images_to_registry: update the versions | cloud/wmcs-cookbooks | main | +44 -20 |
Details
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| kyverno: upgrade to 3.3.9 | repos/cloud/toolforge/toolforge-deploy!889 | dcaro | upgrade_kyverno | main |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T408785 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.33 | |||
| Open | None | T379047 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.32 | |||
| Open | taavi | T372697 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.31 | |||
| Open | None | T335131 [infra,k8s] replace admission controllers with an existing policy admin project | |||
| Open | None | T364293 [infra,k8s] Move to kubernetes VAPs and drop kyverno | |||
| Resolved | dcaro | T362869 [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) | |||
| Resolved | dcaro | T394787 [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support | |||
| Resolved | dcaro | T401681 [kyverno] policy countres stopped showing correctly in grafana | |||
| Resolved | dcaro | T401684 [kyverno] upgrade to 3.3.9 in tools failed leaving a half-upgraded system |
Event Timeline
Change #1148352 had a related patch set uploaded (by David Caro; author: David Caro):
[cloud/wmcs-cookbooks@main] kyverno.copy_images_to_registry: update the versions
Uploaded the images to docker-registry.tools.wmflabs.org:
docker-registry.tools.wikimedia.cloud/toolforge-kyverno-kyverno:v1.13.6 docker-registry.tools.wikimedia.cloud/toolforge-kyverno-kyverno-cli:v1.13.6 docker-registry.tools.wikimedia.cloud/toolforge-kyverno-kyvernopre:v1.13.6 docker-registry.tools.wikimedia.cloud/toolforge-kyverno-background-controller:v1.13.6 docker-registry.tools.wikimedia.cloud/toolforge-kyverno-cleanup-controller:v1.13.6 docker-registry.tools.wikimedia.cloud/toolforge-kyverno-reports-controller:v1.13.6 docker-registry.tools.wikimedia.cloud/bitnami-kubectl:1.30.2 docker-registry.tools.wikimedia.cloud/busybox:1.35
Change #1148352 merged by jenkins-bot:
[cloud/wmcs-cookbooks@main] kyverno.copy_images_to_registry: update the versions
Mentioned in SAL (#wikimedia-cloud) [2025-08-11T08:14:16Z] <dcaro> deploying kyverno (T394787)
Upgraded toolsbeta, I was running the tests also during that time and they did not fail, there were some weird events going through though:
dcaro@toolsbeta-bastion-6:~$ kubectl-sudo get events -A --sort-by='{.lastTimestamp}' | grep Warning
maintain-harbor 5m34s Warning UnexpectedJob cronjob/mh--manage-image-retention-cron Saw a job that the controller did not create or forgot: test-15708
maintain-harbor 5m28s Warning UnexpectedJob cronjob/mh--manage-image-retention-cron Saw a job that the controller did not create or forgot: test-3375
kyverno 4m27s Warning Unhealthy pod/kyverno-admission-controller-688866b86f-zw7qf Startup probe failed: Get "https://192.168.208.200:9443/health/liveness": dial tcp 192.168.208.200:9443: connect: connection refused
maintain-harbor 4m1s Warning UnexpectedJob cronjob/mh--manage-image-retention-cron Saw a job that the controller did not create or forgot: test-30724
maintain-harbor 3m55s Warning UnexpectedJob cronjob/mh--manage-image-retention-cron Saw a job that the controller did not create or forgot: test-13268
maintain-harbor 3m48s Warning UnexpectedJob cronjob/mh--manage-harbor-projects-quotas-cron Saw a job that the controller did not create or forgot: test-676
kyverno 3m42s Warning FailedCreate replicaset/kyverno-admission-controller-688866b86f Error creating: Internal error occurred: failed calling webhook "mutate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/mutate/fail?timeout=10s": dial tcp 10.100.206.174:443: connect: connection refused
maintain-harbor 3m42s Warning UnexpectedJob cronjob/mh--manage-harbor-projects-quotas-cron Saw a job that the controller did not create or forgot: test-28609
maintain-harbor 3m39s Warning UnexpectedJob cronjob/mh--manage-harbor-projects-quotas-cron Saw a job that the controller did not create or forgot: test-9136
maintain-harbor 3m27s Warning UnexpectedJob cronjob/mh--delete-empty-tool-projects-cron Saw a job that the controller did not create or forgot: test-28369
maintain-harbor 3m22s Warning UnexpectedJob cronjob/mh--delete-stale-toolforge-artifacts-cron Saw a job that the controller did not create or forgot: test-10809It ended up working, but there's the "hiccup" on the kyverno side, and those UnexpectedJob ones.
Sent notice for the upgrade window in tools https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/T5EHGWTGKBX6KPQYP2Q7F27D675TOIRY/
Mentioned in SAL (#wikimedia-cloud) [2025-08-12T10:01:31Z] <dcaro> starting upgrade for kyverno (T394787)
First try to upgrade on tools failed, error message:
root@tools-k8s-control-9:~/toolforge-deploy# ./deploy.sh kyverno ... WARNING: top-level config key environments must be defined before releases in helmfile.yaml Adding repo kyverno https://kyverno.github.io/kyverno "kyverno" has been added to your repositories Affected releases are: kyverno (kyverno/kyverno) UPDATED Do you really want to sync? Helmfile will sync all your releases, as shown above. [y/n]: y Upgrading release=kyverno, chart=kyverno/kyverno, namespace=kyverno FAILED RELEASES: NAME NAMESPACE CHART VERSION DURATION kyverno kyverno kyverno/kyverno 16s in ./helmfile.yaml: failed processing release kyverno: command "/usr/sbin/helm" exited with non-zero status: PATH: /usr/sbin/helm ARGS: 0: helm (4 bytes) 1: upgrade (7 bytes) 2: --install (9 bytes) 3: kyverno (7 bytes) 4: kyverno/kyverno (15 bytes) 5: --version (9 bytes) 6: 3.3.9 (5 bytes) 7: --create-namespace (18 bytes) 8: --namespace (11 bytes) 9: kyverno (7 bytes) 10: --values (8 bytes) 11: /tmp/helmfile3409313553/kyverno-kyverno-values-bcbd456c5 (56 bytes) 12: --values (8 bytes) 13: /tmp/helmfile3369218270/kyverno-kyverno-values-b78494bbf (56 bytes) 14: --reset-values (14 bytes) 15: --history-max (13 bytes) 16: 10 (2 bytes) ERROR: exit status 1 EXIT STATUS 1 STDERR: Error: UPGRADE FAILED: cannot patch "cleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "cleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "clustercleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "clustercleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions COMBINED OUTPUT: Error: UPGRADE FAILED: cannot patch "cleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "cleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "clustercleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "clustercleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions
Tool functional tests are passing, and no outage was detected. All kyverno pods are running the new images now too.
All tests and policies are correctly in place, the issue seems to have affected the migration to the newer CRD versions only, sounds like a race condition to me. Note that it worked in lima-kilo and toolsbeta out of the box.
There's an issue upstream that recommends using their cli, might try that: https://github.com/kyverno/kyverno/issues/12633
Mentioned in SAL (#wikimedia-cloud) [2025-08-12T12:49:43Z] <dcaro> manually migrate cleanuppolicies.kyverno.io and clustercleanuppolicies.kyverno.io (using kyverno cli) (T394787)
Mentioned in SAL (#wikimedia-cloud) [2025-08-12T12:50:27Z] <dcaro> redepoly kyverno (T394787)
I ran:
root@tools-k8s-control-9:~# wget https://github.com/kyverno/kyverno/releases/download/v1.13.6/kyverno-cli_v1.13.6_linux_x86_64.tar.gz ... root@tools-k8s-control-9:~# tar xvzf kyverno-cli_v1.13.6_linux_x86_64.tar.gz LICENSE kyverno root@tools-k8s-control-9:~# ./kyverno migrate --resource cleanuppolicies.kyverno.io migrating resource: cleanuppolicies.kyverno.io ... stored version: v2beta1 migrating resources... patching status... root@tools-k8s-control-9:~# ./kyverno migrate --resource clustercleanuppolicies.kyverno.io migrating resource: clustercleanuppolicies.kyverno.io ... stored version: v2beta1 migrating resources... patching status...
re-deploying