Page MenuHomePhabricator

toolforge: review pod templates for PSP replacement
Closed, ResolvedPublic

Description

In both jobs-api and webservices we need to review the pod templates. Previously, the PodSecurityPolicy mechanism would mutate Pod resources to meet the security criteria.

With T279110: [infra] Replace PodSecurityPolicy in Toolforge Kubernetes this will no longer be the case, so we need to natively create Pod resources with a configuration that matches both PSP and the replacement policy, a Kyverno policy. At least during the migration period.

For example, we could make sure our templates include:

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  privileged: false
  hostNetwork: false
  hostIPC: false
  hostPID: false
  capabilities:
    drop:
    - ALL
  seccompProfile:
    type: "runtime/default"

In other words, Pods created both pre-PSP and post-PSP need to validate against the same kyverno policies.

See pod-level securityContext documentation here: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#podsecuritycontext-v1-core
See container-level securityContext documentation here: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#securitycontext-v1-core

Details

TitleReferenceAuthorSource BranchDest Branch
maintain-kubeusers: bump to 0.0.156-20240626103707-3aa9727drepos/cloud/toolforge/toolforge-deploy!353project_1317_bot_df3177307bed93c3f34e421e26c86e38bump_maintain-kubeusersmain
d/changelog: bump to 0.103.9repos/cloud/toolforge/tools-webservice!47aborreroarturo-114-d-changelog-bump-tomain
kyverno_pod_policy: don't autogenerate validation rulesrepos/cloud/toolforge/maintain-kubeusers!51aborreroarturo-101-kyverno_pod_policymain
maintain-kubeusers: bump to 0.0.154-20240625155114-8428f7d3repos/cloud/toolforge/toolforge-deploy!350project_1317_bot_df3177307bed93c3f34e421e26c86e38bump_maintain-kubeusersmain
kyverno_pod_policy: validate fsGroup setting only if presentrepos/cloud/toolforge/maintain-kubeusers!50aborreroarturo-202-kyverno_pod_policymain
d/changelog: bump to 0.103.8repos/cloud/toolforge/tools-webservice!44aborreroarturo-182-d-changelog-bump-tomain
jobs-api: bump to 0.0.310-20240625090205-108e6a0frepos/cloud/toolforge/toolforge-deploy!348project_1317_bot_df3177307bed93c3f34e421e26c86e38bump_jobs-apimain
jobs: drop ProcMount from securityContextrepos/cloud/toolforge/jobs-api!97aborreroarturo-277-jobs-drop-procmountmain
kubernetes: drop ProcMount from securityContextrepos/cloud/toolforge/tools-webservice!43aborreroarturo-190-kubernetes-drop-promain
functional-tests,webservice: add shell testrepos/cloud/toolforge/toolforge-deploy!347dcaroadd_shell_testmain
d/changelog: bump to 0.103.7repos/cloud/toolforge/tools-webservice!42aborreroarturo-274-d-changelog-bump-tomain
kubernetes: introduce securityContext in the pod templaterepos/cloud/toolforge/tools-webservice!37aborreroarturo-104-kubernetes-introducmain
jobs-api: bump to 0.0.300-20240507123100-0371f944repos/cloud/toolforge/toolforge-deploy!279project_1317_bot_df3177307bed93c3f34e421e26c86e38bump_jobs-apimain
jobs: fix securityContext readOnlyRootFilesystemrepos/cloud/toolforge/jobs-api!83aborreroarturo-107-jobs-fix-securitycomain
jobs-api: bump to 0.0.299-20240507120229-eb816a7drepos/cloud/toolforge/toolforge-deploy!278project_1317_bot_df3177307bed93c3f34e421e26c86e38bump_jobs-apimain
jobs-api: introduce securityContext in the pod templaterepos/cloud/toolforge/jobs-api!75aborreroarturo-58-jobs-api-introduce-smain
Show related patches Customize query in GitLab

Event Timeline

readOnlyRootFilesystem: true

We probably don't want to enforce this, so people can create temporary files and similar without the need of mounting volumes

readOnlyRootFilesystem: true

We probably don't want to enforce this, so people can create temporary files and similar without the need of mounting volumes

I got that one wrong. Yes, indeed the PSP we have today don't enforce this to true, but to false, which is the default anyway https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/

aborrero changed the task status from Open to In Progress.Apr 17 2024, 9:20 AM
aborrero triaged this task as Medium priority.
aborrero moved this task from Next to Doing on the User-aborrero board.
aborrero renamed this task from review pod templates for stricter security to toolforge: review pod templates for PSP replacement.May 6 2024, 12:56 PM
aborrero updated the task description. (Show Details)

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/278

jobs-api: bump to 0.0.299-20240507120229-eb816a7d

Before patch https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/278 with only PSP, a Pod resource would have:

  • at container level:
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  runAsGroup: 54005
  runAsUser: 54005
  • at pod level:
securityContext:
  fsGroup: 54005
  seccompProfile:
    type: RuntimeDefault
  supplementalGroups:
  - 1

With the above patch deployed:

  • at container-level:
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  privileged: false
  procMount: Default
  readOnlyRootFilesystem: true
  runAsGroup: 54005
  runAsNonRoot: true
  runAsUser: 54005
  • at pod level:
securityContext:
  fsGroup: 54005
  runAsGroup: 54005
  runAsNonRoot: true
  runAsUser: 54005
  seccompProfile:
    type: RuntimeDefault
  supplementalGroups:
  - 1

The supplementalGroups is added by PSP, not by the the jobs-api template.

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/279

jobs-api: bump to 0.0.300-20240507123100-0371f944

Mentioned in SAL (#wikimedia-cloud) [2024-06-24T15:44:57Z] <arturo> deploy toolforge-webservice 0.103.7 (T362050)

Mentioned in SAL (#wikimedia-cloud) [2024-06-24T15:45:01Z] <arturo> deploy toolforge-webservice 0.103.7 (T362050)

$ webservice perl5.36 shell --mount=all
Error from server (Forbidden): pods "shell-1719270590" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
$ webservice php8.2 shell --mount=all
Error from server (Forbidden): pods "shell-1719270759" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
$ webservice python3.11 shell --mount=all
Error from server (Forbidden): pods "shell-1719270775" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
$ dpkg -l | grep toolforge-webservice
ii  toolforge-webservice                 0.103.7                              all          Infrastructure for running webservices on Toolforge
[23:14]  <   anomie> Looks like it still works from login-buster.toolforge.org 🤷
$ ssh login-buster.toolforge.org
$ $ dpkg -l | grep toolforge-webservice
ii  toolforge-webservice                                        0.103.6                            all          Infrastructure for running webservices on Toolforge

Until webservice shell is fixed generally, hacky workarounds are:

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/348

jobs-api: bump to 0.0.310-20240625090205-108e6a0f

$ webservice perl5.36 shell --mount=all
Error from server (Forbidden): pods "shell-1719270590" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
$ webservice php8.2 shell --mount=all
Error from server (Forbidden): pods "shell-1719270759" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
$ webservice python3.11 shell --mount=all
Error from server (Forbidden): pods "shell-1719270775" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
$ dpkg -l | grep toolforge-webservice
ii  toolforge-webservice                 0.103.7                              all          Infrastructure for running webservices on Toolforge

This turned out to be an interesting bug, in several stages:

Thanks to @dcaro for assisting with this weird bug, and helping me making sense and explaining this mystery.

We have decided to drop the procMount entry entirely, as it refers to a feature gate we don't even use in our clusters in the first place.

Mentioned in SAL (#wikimedia-cloud) [2024-06-25T09:42:17Z] <arturo> deploy toolforge-webservice 0.103.8 (T362050)

Mentioned in SAL (#wikimedia-cloud) [2024-06-25T09:44:01Z] <arturo> deploy toolforge-webservice 0.103.8 (T362050)

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/350

maintain-kubeusers: bump to 0.0.154-20240625155114-8428f7d3

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/353

maintain-kubeusers: bump to 0.0.156-20240626103707-3aa9727d