Page MenuHomePhabricator

Replace PodSecurityPolicy in Toolforge Kubernetes
Open, MediumPublic

Description

As of Kubernetes 1.21, what was considered an essential feature by many, pod security policies, is marked deprecated and set for removal in 1.25 (some sources say 1.22, but we'll see). That's long enough to spend a fair bit of time on replacements. We use this as an integral part of the Toolforge security model in order to prevent some of the more egregious security failures of Kubernetes itself. Since this is a very particular set of restrictions that we rely on, it is worthwhile to enumerate them because there may be more than one way to do each one.

  • Drop all Docker capabilities
  • Prevents running privileged containers (effectively like having all capabilities if not worse) and privilege escalation.
  • Restricts you to only running as your "own" LDAP user inside the container, which is essential because of NFS
  • Restricts root group in supplemental groups
  • Allows only your LDAP primary group
  • Allows the following volume mount types (not a big restriction here):
    • configMap
    • downwardAPI
    • emptyDir
    • projected
    • secret
    • hostPath
    • persistentVolumeClaim
  • applies system default seccomp rules
  • allows only the following hostPath mounts (and restricts readwrite mount for some):
allowedHostPaths:
 - pathPrefix: /var/lib/sss/pipes
 - pathPrefix: /data/project
 - pathPrefix: /data/scratch
 - pathPrefix: /public/dumps
   readOnly: true
 - pathPrefix: /mnt/nfs
   readOnly: true
 - pathPrefix: /etc/wmcs-project
   readOnly: true
 - pathPrefix: /etc/ldap.yaml
   readOnly: true
 - pathPrefix: /etc/novaobserver.yaml
   readOnly: true
 - pathPrefix: /etc/ldap.conf
   readOnly: true

Possible solutions:

  1. Tighter validating admission control webhooks
  2. Implementing open policy agent, which is basically a validating and mutating webhook on steroids that has a domain-specific policy language to talk to. This is honestly where we and other orgs are likely to move. It is also the only alternative mentioned in the current k8s doc (as of April 1, 2021) https://kubernetes.io/docs/concepts/security/pod-security-standards/#what-s-the-difference-between-a-security-policy-and-a-security-context For that matter, sig-auth, which is the group that decided to nix PSP basically called out OPA gatekeeper as the way people should move forward in the middle fo that discussion https://docs.google.com/presentation/d/1Kv6BSBNyLCyglMbK7e6tVOaDYe89LV2aHL2Hlb-9HX8/edit#slide=id.p

Event Timeline

Andrew triaged this task as Medium priority.Apr 13 2021, 4:14 PM

Prod task for the same issue is here: T273507 For reference, we basically are likely to want to use OPA Gatekeeper. There's a fair bit to document around that, but it's entirely possible to translate PSPs directly to it's policy language.