Problem
Regardeless of the policy agent we finally decide for Toolforge (see T362233: Decision Request - Toolforge policy agent), and in addition to that decision, we also need to decide between a couple of options regarding how we want to enforce the different resource security policies, which may have some differences in the semantics and behavior of the platform.
Both Kyverno and OPA Gatekeeper can work in different modes:
- enforcement via validation: reject resource definitions that doesn't meet the policies.
- enforcement via mutation: mutate resource definitions so they conform with the policies. this is how PodSecurityPolicy has been working so far
- no enforcement, only audit: all resources will be evaluated against the policies, and an audit record will be created.
Example of validation:
- given a policy that requires every Pod resource to have allowPrivilegeEscalation: false
- if somebody tries to create a Pod resource with allowPrivilegeEscalation: true, reject it. An error message will be produced.
Example of mutation:
- given a policy that requires every Pod resource to have allowPrivilegeEscalation: false
- every time a Pod resource is created, mutate it (modify it) to add allowPrivilegeEscalation: false. No error message will be produced.
- this is how PodSecurityPolicy has been working so far
Example of audit:
- given a policy that requires every Pod resource to have allowPrivilegeEscalation: false
- if a Pod resource doesn't conform to the policy, emit an audit record (but otherwise do nothing else).
Constraints and risks
- this affects both for ourselves, in the different -api components we have, and tool developers that have direct access to the k8s API.
- semantics are different, and require a different level of commitment, specially for users of the k8s API directly.
Decision record
Options
Option 1
Enforcement via validation.
This makes everyone explicitly aware of the different policies we have in Toolforge kubernetes, given they have to manually adapt and code to conform to them.
Pros:
- possibly the simplest
- the semantic is explicit: if a policy violation happens, an visible error will be produced.
Cons:
- may require code updates, to conform the policies.
- given policies can change, these code updates may be required on a continuous basis
- not how PSP has been working so far
Option 2
Enforcement via mutation.
This doesn't makes everyone explicitly aware of the different policies we have in Toolforge kubernetes, because mutation is taking care of updating the resources to conform to policies.
Pros:
- this is how PodSecurityPolicy has been working so far
- transparent enforcement for everyone, no error messages to decode
- less code updates to track policy changes
Cons:
- people are less aware of the different policies we have in Toolforge kubernetes
- a piece of software arbitrarily updating resources sound a bit scary.
- it is not clear how mutation would work for policy changes and already present resources. I.e a given Pod was mutated to conform policy on date X. But the policy has now changed. What do we do with the already defined Pod?
Option 3
Combination:
- validation for optional policies
- mutation for mandatory policies
Given there could be resource attributes that could be optional. We could introduce some kind of mixed approach.
Pros:
- maybe the most flexible approach?
Cons:
- perhaps the most confusing semantic? as there are things happening automagically, and others requiring explicit code changes.