Page MenuHomePhabricator

[kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support
Closed, ResolvedPublic

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
kyverno: upgrade to 3.3.9repos/cloud/toolforge/toolforge-deploy!889dcaroupgrade_kyvernomain
Customize query in GitLab

Event Timeline

Change #1148352 had a related patch set uploaded (by David Caro; author: David Caro):

[cloud/wmcs-cookbooks@main] kyverno.copy_images_to_registry: update the versions

https://gerrit.wikimedia.org/r/1148352

dcaro triaged this task as Medium priority.May 20 2025, 3:55 PM

Uploaded the images to docker-registry.tools.wmflabs.org:

docker-registry.tools.wikimedia.cloud/toolforge-kyverno-kyverno:v1.13.6
docker-registry.tools.wikimedia.cloud/toolforge-kyverno-kyverno-cli:v1.13.6
docker-registry.tools.wikimedia.cloud/toolforge-kyverno-kyvernopre:v1.13.6
docker-registry.tools.wikimedia.cloud/toolforge-kyverno-background-controller:v1.13.6
docker-registry.tools.wikimedia.cloud/toolforge-kyverno-cleanup-controller:v1.13.6
docker-registry.tools.wikimedia.cloud/toolforge-kyverno-reports-controller:v1.13.6
docker-registry.tools.wikimedia.cloud/bitnami-kubectl:1.30.2
docker-registry.tools.wikimedia.cloud/busybox:1.35

Change #1148352 merged by jenkins-bot:

[cloud/wmcs-cookbooks@main] kyverno.copy_images_to_registry: update the versions

https://gerrit.wikimedia.org/r/1148352

dcaro changed the task status from Open to In Progress.May 21 2025, 1:20 PM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 20) board.

Upgraded toolsbeta, I was running the tests also during that time and they did not fail, there were some weird events going through though:

dcaro@toolsbeta-bastion-6:~$ kubectl-sudo get events -A  --sort-by='{.lastTimestamp}' | grep Warning
maintain-harbor           5m34s       Warning   UnexpectedJob          cronjob/mh--manage-image-retention-cron                                   Saw a job that the controller did not create or forgot: test-15708
maintain-harbor           5m28s       Warning   UnexpectedJob          cronjob/mh--manage-image-retention-cron                                   Saw a job that the controller did not create or forgot: test-3375
kyverno                   4m27s       Warning   Unhealthy              pod/kyverno-admission-controller-688866b86f-zw7qf                         Startup probe failed: Get "https://192.168.208.200:9443/health/liveness": dial tcp 192.168.208.200:9443: connect: connection refused
maintain-harbor           4m1s        Warning   UnexpectedJob          cronjob/mh--manage-image-retention-cron                                   Saw a job that the controller did not create or forgot: test-30724
maintain-harbor           3m55s       Warning   UnexpectedJob          cronjob/mh--manage-image-retention-cron                                   Saw a job that the controller did not create or forgot: test-13268
maintain-harbor           3m48s       Warning   UnexpectedJob          cronjob/mh--manage-harbor-projects-quotas-cron                            Saw a job that the controller did not create or forgot: test-676
kyverno                   3m42s       Warning   FailedCreate           replicaset/kyverno-admission-controller-688866b86f                        Error creating: Internal error occurred: failed calling webhook "mutate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/mutate/fail?timeout=10s": dial tcp 10.100.206.174:443: connect: connection refused
maintain-harbor           3m42s       Warning   UnexpectedJob          cronjob/mh--manage-harbor-projects-quotas-cron                            Saw a job that the controller did not create or forgot: test-28609
maintain-harbor           3m39s       Warning   UnexpectedJob          cronjob/mh--manage-harbor-projects-quotas-cron                            Saw a job that the controller did not create or forgot: test-9136
maintain-harbor           3m27s       Warning   UnexpectedJob          cronjob/mh--delete-empty-tool-projects-cron                               Saw a job that the controller did not create or forgot: test-28369
maintain-harbor           3m22s       Warning   UnexpectedJob          cronjob/mh--delete-stale-toolforge-artifacts-cron                         Saw a job that the controller did not create or forgot: test-10809

It ended up working, but there's the "hiccup" on the kyverno side, and those UnexpectedJob ones.

Mentioned in SAL (#wikimedia-cloud) [2025-08-12T10:01:31Z] <dcaro> starting upgrade for kyverno (T394787)

First try to upgrade on tools failed, error message:

root@tools-k8s-control-9:~/toolforge-deploy# ./deploy.sh kyverno
...
WARNING: top-level config key environments must be defined before releases in helmfile.yaml
Adding repo kyverno https://kyverno.github.io/kyverno
"kyverno" has been added to your repositories

Affected releases are:
  kyverno (kyverno/kyverno) UPDATED

Do you really want to sync?
  Helmfile will sync all your releases, as shown above.

 [y/n]: y
Upgrading release=kyverno, chart=kyverno/kyverno, namespace=kyverno

FAILED RELEASES:
NAME      NAMESPACE   CHART             VERSION   DURATION
kyverno   kyverno     kyverno/kyverno                  16s

in ./helmfile.yaml: failed processing release kyverno: command "/usr/sbin/helm" exited with non-zero status:

PATH:
  /usr/sbin/helm

ARGS:
  0: helm (4 bytes)
  1: upgrade (7 bytes)
  2: --install (9 bytes)
  3: kyverno (7 bytes)
  4: kyverno/kyverno (15 bytes)
  5: --version (9 bytes)
  6: 3.3.9 (5 bytes)
  7: --create-namespace (18 bytes)
  8: --namespace (11 bytes)
  9: kyverno (7 bytes)
  10: --values (8 bytes)
  11: /tmp/helmfile3409313553/kyverno-kyverno-values-bcbd456c5 (56 bytes)
  12: --values (8 bytes)
  13: /tmp/helmfile3369218270/kyverno-kyverno-values-b78494bbf (56 bytes)
  14: --reset-values (14 bytes)
  15: --history-max (13 bytes)
  16: 10 (2 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: UPGRADE FAILED: cannot patch "cleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "cleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "clustercleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "clustercleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions

COMBINED OUTPUT:
  Error: UPGRADE FAILED: cannot patch "cleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "cleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "clustercleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "clustercleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions

Tool functional tests are passing, and no outage was detected. All kyverno pods are running the new images now too.

All tests and policies are correctly in place, the issue seems to have affected the migration to the newer CRD versions only, sounds like a race condition to me. Note that it worked in lima-kilo and toolsbeta out of the box.

There's an issue upstream that recommends using their cli, might try that: https://github.com/kyverno/kyverno/issues/12633

Mentioned in SAL (#wikimedia-cloud) [2025-08-12T12:49:43Z] <dcaro> manually migrate cleanuppolicies.kyverno.io and clustercleanuppolicies.kyverno.io (using kyverno cli) (T394787)

I ran:

root@tools-k8s-control-9:~# wget https://github.com/kyverno/kyverno/releases/download/v1.13.6/kyverno-cli_v1.13.6_linux_x86_64.tar.gz
...
root@tools-k8s-control-9:~# tar xvzf kyverno-cli_v1.13.6_linux_x86_64.tar.gz 
LICENSE
kyverno

root@tools-k8s-control-9:~# ./kyverno migrate --resource cleanuppolicies.kyverno.io
migrating resource: cleanuppolicies.kyverno.io ...
stored version: v2beta1
migrating resources...
patching status...
root@tools-k8s-control-9:~# ./kyverno migrate --resource clustercleanuppolicies.kyverno.io 
migrating resource: clustercleanuppolicies.kyverno.io ...
stored version: v2beta1
migrating resources...
patching status...

re-deploying

dcaro moved this task from In Progress to Done on the Toolforge (Toolforge iteration 23) board.