Page MenuHomePhabricator

[infra] Replace PodSecurityPolicy in Toolforge Kubernetes
Closed, ResolvedPublic

Description

As of Kubernetes 1.21, what was considered an essential feature by many, pod security policies, is marked deprecated and set for removal in 1.25 (some sources say 1.22, but we'll see). That's long enough to spend a fair bit of time on replacements. We use this as an integral part of the Toolforge security model in order to prevent some of the more egregious security failures of Kubernetes itself. Since this is a very particular set of restrictions that we rely on, it is worthwhile to enumerate them because there may be more than one way to do each one.

  • Drop all Docker capabilities
  • Prevents running privileged containers (effectively like having all capabilities if not worse) and privilege escalation.
  • Restricts you to only running as your "own" LDAP user inside the container, which is essential because of NFS
  • Restricts root group in supplemental groups
  • Allows only your LDAP primary group
  • Allows the following volume mount types (not a big restriction here):
    • configMap
    • downwardAPI
    • emptyDir
    • projected
    • secret
    • hostPath
    • persistentVolumeClaim
  • applies system default seccomp rules
  • allows only the following hostPath mounts (and restricts readwrite mount for some):
allowedHostPaths:
 - pathPrefix: /var/lib/sss/pipes
 - pathPrefix: /data/project
 - pathPrefix: /data/scratch
 - pathPrefix: /public/dumps
   readOnly: true
 - pathPrefix: /mnt/nfs
   readOnly: true
 - pathPrefix: /etc/wmcs-project
   readOnly: true
 - pathPrefix: /etc/ldap.yaml
   readOnly: true
 - pathPrefix: /etc/novaobserver.yaml
   readOnly: true
 - pathPrefix: /etc/ldap.conf
   readOnly: true

Possible solutions:

  1. Migrate to PodSecurityAdmission https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/
  2. Tighter validating admission control webhooks
  3. Implementing open policy agent, which is basically a validating and mutating webhook on steroids that has a domain-specific policy language to talk to. This is honestly where we and other orgs are likely to move. It is also the only alternative mentioned in the current k8s doc (as of April 1, 2021) https://kubernetes.io/docs/concepts/security/pod-security-standards/#what-s-the-difference-between-a-security-policy-and-a-security-context For that matter, sig-auth, which is the group that decided to nix PSP basically called out OPA gatekeeper as the way people should move forward in the middle fo that discussion https://docs.google.com/presentation/d/1Kv6BSBNyLCyglMbK7e6tVOaDYe89LV2aHL2Hlb-9HX8/edit#slide=id.p

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenSlst2020
Openaborrero
Openfnegri
OpenRaymond_Ndibe
In ProgressRaymond_Ndibe
ResolvedSlst2020
Resolvedaborrero
Resolvedaborrero
OpenNone
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Openaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
Duplicatedcaro
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Declinedaborrero
Declinedaborrero
Declinedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Openaborrero
DuplicateNone
Resolvedaborrero

Event Timeline

Andrew triaged this task as Medium priority.Apr 13 2021, 4:14 PM

Prod task for the same issue is here: T273507 For reference, we basically are likely to want to use OPA Gatekeeper. There's a fair bit to document around that, but it's entirely possible to translate PSPs directly to it's policy language.

dcaro raised the priority of this task from Medium to High.Apr 2 2024, 2:05 PM
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.

It could be interesting to have T357977: [toolforge.infra] create fullstack tests in place before this change, to ease in the migration.

dcaro renamed this task from Replace PodSecurityPolicy in Toolforge Kubernetes to [infra] Replace PodSecurityPolicy in Toolforge Kubernetes.Apr 3 2024, 1:27 PM
aborrero changed the task status from Open to In Progress.Apr 19 2024, 10:32 AM
aborrero claimed this task.

Change #1036640 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: drop toolforge-tfb-psp

https://gerrit.wikimedia.org/r/1036640

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/326

maintain-kubeusers: bump to 0.0.148-20240612113501-fa8bd88a