Page MenuHomePhabricator

toolforge: kyverno: change policies to Enforce
Closed, ResolvedPublic

Description

We initially deployed Toolforge Kyverno pod security policies in Audit mode to evaluate how they operate in our cluster.

We need to change them to Enforce if we want to move forward with T279110: [infra] Replace PodSecurityPolicy in Toolforge Kubernetes.

Event Timeline

aborrero changed the task status from Open to In Progress.Jun 21 2024, 11:41 AM
aborrero triaged this task as High priority.
aborrero moved this task from Backlog to Doing on the User-aborrero board.

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/342

maintain-kubeusers: bump to 0.0.153-20240624092755-fd0244da

Question: how setting the policy to enforce will affect existing deployments that were created prior to the changes in T362050: toolforge: review pod templates for PSP replacement?

Well, per upstream docs:

Background scanning, enabled by default in a Policy or ClusterPolicy object with the spec.background field, allows Kyverno to periodically scan existing resources and find if they match any validate or verifyImages rules. If existing resources are found which would violate an existing policy, the background scan notes them in a ClusterPolicyReport or a PolicyReport object, depending on if the resource is namespaced or not. It does not block any existing resources that match a rule, even in Enforce mode. It has no effect on either generate or mutate rules for the purposes of reporting.

source: https://kyverno.io/docs/policy-reports/background/

Also:

background scanning and is enabled by default unless spec.background is set to false in a policy

Things I've checked:

  1. re-read the upstream docs about what happens if you set policies to enforce while there are offending resources defined in the cluster. This is the case of webservices defined by older versions of the CLI (same for jobs). See previous comment https://phabricator.wikimedia.org/T368141#9921404
  2. performance impact of changing thousand of policies from Audit to Enforce. I tested this in lima-kilo in my laptop. I did not see any relevant impact.
  3. templates generated by newer jobs/webservices are valid for the new policies. Tested as part of T362050: toolforge: review pod templates for PSP replacement
  4. functional tests passes pre/post changes, in lima-kilo

I plan to perform this change tomorrow wednesday 2024-06-26 at 08:30Z

Before setting policies to Enforce, I've checked again the policy reports.

There are a bunch of policy violations:

aborrero@tools-k8s-control-7:~$ sudo -i kubectl get event -A | grep PolicyViolation  | wc -l
11760

All of them contain the same content, an autogen rule complaining about the runAsGroup parameter:

tool-zumraband                              33m         Warning   PolicyViolation           deployment/zumraband                                             policy toolforge-kyverno-pod-policy/autogen-toolforge-validate-pod-policy fail: validation error: pod security configuration must be correct. rule autogen-toolforge-validate-pod-policy failed at path /spec/template/spec/securityContext/runAsGroup/

I think this happens because Deployment and other pod-generators that were created by old version of jobs-cli and webservice don't inject the securityContext. They were relying on PSP mutating the Pod resources.

All these PolicyViolations are from autogen rules. They are not for Pod resources, but for Pod-generators, like Deployments. Pod resources have a good securityContext, either by the PSP mutation, or by the new Kyverno mutation.

This means that if we set policies to Enforce, Pods being created will all pass the validations.

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/351

maintain-kubeusers: bump to 0.0.155-20240626090852-f6b198f9

Mentioned in SAL (#wikimedia-cloud) [2024-06-26T09:15:41Z] <arturo> setting kyverno policies to Enforce (T368141)

Mentioned in SAL (#wikimedia-cloud) [2024-06-26T09:18:19Z] <arturo> setting kyverno policies to Enforce (T368141)

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/353

maintain-kubeusers: bump to 0.0.156-20240626103707-3aa9727d