Page MenuHomePhabricator

Set up PodSecurityPolicies in clusters
Closed, ResolvedPublic

Description

Currently there is no restriction regarding security in our clusters, setting up PodSecurityPolicies will help us to ensure that containers are not run as root, privileged or with special capabilities.

Cluster services like coredns should have privileges to run with those capabilities and privileges since often they need it

Event Timeline

Change 525281 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s: adding PodSecurityPolicies

https://gerrit.wikimedia.org/r/525281

fsero triaged this task as Medium priority.Jul 25 2019, 9:26 AM
fsero moved this task from Incoming 🐫 to Doing 😎 on the serviceops board.

Change 525281 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s: adding PodSecurityPolicies

https://gerrit.wikimedia.org/r/525281

Change 525281 merged by Fsero:
[operations/deployment-charts@master] k8s: adding PodSecurityPolicies

https://gerrit.wikimedia.org/r/525281

Change 525553 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] k8s: enabling PodSecurityPolicy admission controller in staging

https://gerrit.wikimedia.org/r/525553

Change 525553 merged by Fsero:
[operations/puppet@production] k8s: enabling PodSecurityPolicy admission controller in staging

https://gerrit.wikimedia.org/r/525553

Change 649629 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/deployment-charts@master] admin_ng Update/Fix PodSecurityPolicies

https://gerrit.wikimedia.org/r/649629

Change 650469 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] Allow the kube-controller-manager to run without superuser permissions

https://gerrit.wikimedia.org/r/650469

Change 650473 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[labs/private@master] k8s_infrastructure_users: Add system:kube-controller-manager

https://gerrit.wikimedia.org/r/650473

Change 650473 merged by JMeybohm:
[labs/private@master] k8s_infrastructure_users: Add system:kube-controller-manager

https://gerrit.wikimedia.org/r/650473

Change 649629 merged by jenkins-bot:
[operations/deployment-charts@master] admin_ng Update/Fix PodSecurityPolicies

https://gerrit.wikimedia.org/r/649629

Change 650469 merged by JMeybohm:
[operations/puppet@production] Allow the kube-controller-manager to run without superuser permissions

https://gerrit.wikimedia.org/r/650469

Change 660379 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] k8s::kubelet: Ensure apparmor is installed

https://gerrit.wikimedia.org/r/660379

With kube-controller-manager running with service accounts, pods could no longer be created because validation against apparmor profiles failed (as we not had apparmor installed on the nodes). With that installed, the deploy users (and tillers) in normal namespaces are PSP restricted like they should (with proper error messages etc.).
In "kube-system", the unrestricted profile applies and I'm still able to launch privileged containers, run stuff as root etc.

Played a bit and we will definitely run into a bunch of issues/work here, from api-gateway:

Warning  Failed     33s (x2 over 33s)  kubelet, kubestage2002.codfw.wmnet  Error: container has runAsNonRoot and image has non-numeric user (runuser), cannot verify user is non-root
Warning  Failed     33s (x2 over 33s)  kubelet, kubestage2002.codfw.wmnet  Error: container has runAsNonRoot and image has non-numeric user (nutcracker), cannot verify user is non-root
Warning  Failed     32s (x3 over 33s)  kubelet, kubestage2002.codfw.wmnet  Error: container has runAsNonRoot and image has non-numeric user (envoy), cannot verify user is non-root
Warning  Failed     32s (x3 over 33s)  kubelet, kubestage2002.codfw.wmnet  Error: container has runAsNonRoot and image has non-numeric user (prometheus-statsd-exporter), cannot verify user is non-root

So the verification can only be done when having either securityContext.runAsUser: xxx set *correctly* for each container, or when the "USER" statement in the Dockerfile uses the numeric notation. While the second option is probably easier to do and get right (especially in combination was helm scaffolding), it will require to patch some tooling and rebuild most/all containers.

https://github.com/kubernetes/kubernetes/blob/v1.16.15/pkg/kubelet/kuberuntime/kuberuntime_container.go#L207
https://github.com/kubernetes/kubernetes/blob/v1.16.15/pkg/kubelet/kuberuntime/security_context.go#L80

Change 660771 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[blubber@master] Use the UID instead of username in USER instructions

https://gerrit.wikimedia.org/r/660771

Change 660784 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/docker-images/docker-pkg@master] Add an check for numeric USER instruction in Dockerfile

https://gerrit.wikimedia.org/r/660784

Change 660851 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/docker-images/docker-pkg@master] Add the 'uid' template helper

https://gerrit.wikimedia.org/r/660851

Change 660379 merged by JMeybohm:
[operations/puppet@production] k8s::kubelet: Ensure apparmor is installed

https://gerrit.wikimedia.org/r/660379

Change 661083 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] k8s::kubelet: Ensure apparmor is purged on old k8s nodes

https://gerrit.wikimedia.org/r/661083

Change 661083 merged by JMeybohm:
[operations/puppet@production] k8s::kubelet: Ensure apparmor is purged on old k8s nodes

https://gerrit.wikimedia.org/r/661083

Change 660771 merged by jenkins-bot:
[blubber@master] Use the UID instead of username in USER instructions

https://gerrit.wikimedia.org/r/660771

Change to blubber(oid) (https://gerrit.wikimedia.org/r/c/blubber/+/660771) has been deployed.

Thanks for taking care!

Change 660784 merged by jenkins-bot:
[operations/docker-images/docker-pkg@master] Add an check for numeric USER instruction in Dockerfile

https://gerrit.wikimedia.org/r/660784

Change 660851 merged by jenkins-bot:
[operations/docker-images/docker-pkg@master] Add the 'uid' template helper

https://gerrit.wikimedia.org/r/660851

This is active in all clusters now

But wait, it's currently still not fully active and blocked by: T274262

This can be closed when https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/667986 has been reverted.

But wait, it's currently still not fully active and blocked by: T274262

This can be closed when https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/667986 has been reverted.

Thanks to @akosiaris this was reverted in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/676512

So PSPs can be considered active now.