Page MenuHomePhabricator

[kyverno] upgrade to 3.3.9 in tools failed leaving a half-upgraded system
Closed, ResolvedPublic

Description

Things that were upgraded:

  • New kyverno images are in use
  • New settings are in use
  • New CRDs are defined

Things that failed

  • Migrating from the old to the new (at least cleaning up deleting the old)

There might be other issues. This task is to track, debug and try to fix it.

Note that we will be dropping kyverno after the next k8s upgrade T364293: [infra,k8s] Move to kubernetes VAPs and drop kyverno.

The error was:

root@tools-k8s-control-9:~/toolforge-deploy# ./deploy.sh kyverno
...
WARNING: top-level config key environments must be defined before releases in helmfile.yaml
Adding repo kyverno https://kyverno.github.io/kyverno
"kyverno" has been added to your repositories

Affected releases are:
  kyverno (kyverno/kyverno) UPDATED

Do you really want to sync?
  Helmfile will sync all your releases, as shown above.

 [y/n]: y
Upgrading release=kyverno, chart=kyverno/kyverno, namespace=kyverno

FAILED RELEASES:
NAME      NAMESPACE   CHART             VERSION   DURATION
kyverno   kyverno     kyverno/kyverno                  16s

in ./helmfile.yaml: failed processing release kyverno: command "/usr/sbin/helm" exited with non-zero status:

PATH:
  /usr/sbin/helm

ARGS:
  0: helm (4 bytes)
  1: upgrade (7 bytes)
  2: --install (9 bytes)
  3: kyverno (7 bytes)
  4: kyverno/kyverno (15 bytes)
  5: --version (9 bytes)
  6: 3.3.9 (5 bytes)
  7: --create-namespace (18 bytes)
  8: --namespace (11 bytes)
  9: kyverno (7 bytes)
  10: --values (8 bytes)
  11: /tmp/helmfile3409313553/kyverno-kyverno-values-bcbd456c5 (56 bytes)
  12: --values (8 bytes)
  13: /tmp/helmfile3369218270/kyverno-kyverno-values-b78494bbf (56 bytes)
  14: --reset-values (14 bytes)
  15: --history-max (13 bytes)
  16: 10 (2 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: UPGRADE FAILED: cannot patch "cleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "cleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "clustercleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "clustercleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions

COMBINED OUTPUT:
  Error: UPGRADE FAILED: cannot patch "cleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "cleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "clustercleanuppolicies.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "clustercleanuppolicies.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions && cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions

A potentially related issue (thanks @fnegri) and workaround: https://github.com/kyverno/kyverno/issues/12633

Event Timeline

dcaro updated the task description. (Show Details)
dcaro added a subscriber: fnegri.

Ran the upgrade manually and it worked:

root@tools-k8s-control-9:~# ./kyverno migrate --resource cleanuppolicies.kyverno.io
migrating resource: cleanuppolicies.kyverno.io ...
stored version: v2beta1
migrating resources...
patching status...
root@tools-k8s-control-9:~# ./kyverno migrate --resource clustercleanuppolicies.kyverno.io 
migrating resource: clustercleanuppolicies.kyverno.io ...
stored version: v2beta1
migrating resources...
patching status...

will try redeploying next

Missed one Error: UPGRADE FAILED: cannot patch "policyexceptions.kyverno.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "policyexceptions.kyverno.io" is invalid: status.storedVersions[0]: Invalid value: "v2alpha1": must appear in spec.versions, redoing and deploying

root@tools-k8s-control-9:~/toolforge-deploy# ../kyverno migrate --resource policyexceptions.kyverno.io
migrating resource: policyexceptions.kyverno.io ...
stored version: v2beta1
migrating resources...
patching status...

Redeploy timed out, just like the previous upgrade:

ARGS:
  0: helm (4 bytes)
  1: upgrade (7 bytes)
  2: --install (9 bytes)
  3: kyverno (7 bytes)
  4: kyverno/kyverno (15 bytes)
  5: --version (9 bytes)
  6: 3.3.9 (5 bytes)
  7: --create-namespace (18 bytes)
  8: --namespace (11 bytes)
  9: kyverno (7 bytes)
  10: --values (8 bytes)
  11: /tmp/helmfile4244151415/kyverno-kyverno-values-bcbd456c5 (56 bytes)
  12: --values (8 bytes)
  13: /tmp/helmfile2189531840/kyverno-kyverno-values-b78494bbf (56 bytes)
  14: --reset-values (14 bytes)
  15: --history-max (13 bytes)
  16: 10 (2 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: UPGRADE FAILED: post-upgrade hooks failed: 1 error occurred:
        * timed out waiting for the condition

COMBINED OUTPUT:
  Error: UPGRADE FAILED: post-upgrade hooks failed: 1 error occurred:
        * timed out waiting for the condition

Yep, the culprit seems to be that post-upgrade cleanup of reports, it iterates through all namespaces, checks if they have policyreports and deletes them if so, a run that does not really delete anything (there was nothing to delete), gets only to the s:

root@tools-k8s-control-9:~/toolforge-deploy# ./deploy.sh kyverno
...
COMBINED OUTPUT:
  Error: UPGRADE FAILED: post-upgrade hooks failed: 1 error occurred:
        * timed out waiting for the condition
root@tools-k8s-control-9:~# kubectl -n kyverno logs --timestamps -f kyverno-clean-reports-vjlkt
....
2025-08-12T13:12:49.533057072Z deleting 1 policyreports in namespace tool-sal-test
2025-08-12T13:12:49.784292379Z policyreport.wgpolicyk8s.io "pol-toolforge-kyverno-pod-policy" deleted
2025-08-12T13:12:49.904871803Z No resources found in tool-salebot namespace.
2025-08-12T13:12:49.909631792Z no policyreports in namespace tool-salebot
2025-08-12T13:12:50.025366725Z No resources found in tool-salixalbatool namespace.
2025-08-12T13:12:50.031120188Z no policyreports in namespace tool-salixalbatool
2025-08-12T13:12:50.149591445Z No resources found in tool-sam namespace.
2025-08-12T13:12:50.155524886Z no policyreports in namespace tool-sam
2025-08-12T13:12:50.264219793Z No resources found in tool-sam-2727bot namespace.
2025-08-12T13:12:50.269061872Z no policyreports in namespace tool-sam-2727bot
2025-08-12T13:12:50.392562045Z No resources found in tool-sammour namespace.
2025-08-12T13:12:50.397349230Z no policyreports in namespace tool-sammour
2025-08-12T13:12:50.520802074Z No resources found in tool-samoabot namespace.
2025-08-12T13:12:50.526982405Z no policyreports in namespace tool-samoabot
2025-08-12T13:12:50.647613043Z deleting 1 policyreports in namespace tool-sample-complex-app
2025-08-12T13:12:50.886706672Z policyreport.wgpolicyk8s.io "pol-toolforge-kyverno-pod-policy" deleted
2025-08-12T13:12:51.010229713Z deleting 1 policyreports in namespace tool-sample-dotnet-buildpack-app
2025-08-12T13:12:51.244279414Z policyreport.wgpolicyk8s.io "pol-toolforge-kyverno-pod-policy" deleted
2025-08-12T13:12:51.361479405Z No resources found in tool-sample-golang-buildpack-app namespace.
2025-08-12T13:12:51.365299356Z no policyreports in namespace tool-sample-golang-buildpack-app
### gets killed here

Let's see if I can run it manually, and skip it during the deploy.

I added the value helmDefaults.timeout: 1800 to the helmfile.yaml, and now it was able to get to it all:

root@tools-k8s-control-9:~/toolforge-deploy# ./deploy.sh kyverno
/root/toolforge-deploy
WARNING: top-level config key environments must be defined before releases in helmfile.yaml
Adding repo kyverno https://kyverno.github.io/kyverno
"kyverno" has been added to your repositories

Comparing release=kyverno, chart=kyverno/kyverno, namespace=kyverno
WARNING: top-level config key environments must be defined before releases in helmfile.yaml
Adding repo kyverno https://kyverno.github.io/kyverno
"kyverno" has been added to your repositories

Affected releases are:
  kyverno (kyverno/kyverno) UPDATED

Do you really want to sync?
  Helmfile will sync all your releases, as shown above.

 [y/n]: y
Upgrading release=kyverno, chart=kyverno/kyverno, namespace=kyverno
Release "kyverno" has been upgraded. Happy Helming!
NAME: kyverno
LAST DEPLOYED: Tue Aug 12 13:28:48 2025
NAMESPACE: kyverno
STATUS: deployed
REVISION: 16
NOTES:
Chart version: 3.3.9
Kyverno version: v1.13.6

Thank you for installing kyverno! Your release is named kyverno.

The following components have been installed in your cluster:
- CRDs
- Admission controller
- Reports controller
- Cleanup controller
- Background controller




⚠️  WARNING: PolicyExceptions are disabled by default. To enable them, set '--enablePolicyException' to true.

💡 Note: There is a trade-off when deciding which approach to take regarding Namespace exclusions. Please see the documentation at https://kyverno.io/docs/installation/#security-vs-operability to understand the risks.

Listing releases matching ^kyverno$
kyverno kyverno         16              2025-08-12 13:28:48.178385591 +0000 UTC deployed        kyverno-3.3.9   v1.13.6    


UPDATED RELEASES:
NAME      NAMESPACE   CHART             VERSION   DURATION
kyverno   kyverno     kyverno/kyverno   3.3.9       11m42s
`

Sending patch

I'll resolve this for now, the patch is linked to the parent task.