Page MenuHomePhabricator

Implement Pod Security Policies for Istio/Knative/Kfserving
Closed, ResolvedPublic

Description

We had to manually apply the following configs to allow k8s to spawn pods:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-restricted-psp
  namespace: istio-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-restricted-psp
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: system:serviceaccounts:istio-system
    namespace: istio-system


kind: RoleBinding
metadata:
  name: allow-restricted-psp
  namespace: knative-serving
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-restricted-psp
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: system:serviceaccounts:knative-serving
    namespace: knative-serving


apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-restricted-psp
  namespace: kfserving-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-restricted-psp
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: system:serviceaccounts:kfserving-system
    namespace: kfserving-system


apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-privileged-psp
  namespace: elukey-test
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-privileged-psp
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: system:serviceaccounts:elukey-test
    namespace: elukey-test

We have a helmfile_psp.yaml config in deployment-charts, that is all about the main k8s clusters. We should figure out if the above configs are correct/enough, and add them to deployment-charts (to avoid having to manually kubectl apply -f them every time).

Event Timeline

Thanks to @JMeybohm's refactoring in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/719295 we'll be able to solve the knative/istio/kfserving configs with a few lines of Helmfile config.

The part related to the custom/inference namespaces should be solved as part of T286791.

Will keep this task open to track the config changes needed but we should be able to close it very soon.

Change 720233 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] Add istio-system namespace config for ml-serve

https://gerrit.wikimedia.org/r/720233

Change 720233 merged by Elukey:

[operations/deployment-charts@master] Add istio-system namespace config for ml-serve

https://gerrit.wikimedia.org/r/720233

istio and knative are now managed by helmfile. For posterity, this was the error that happened for both when trying to sync:

STDERR:
  Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: Namespace "istio-system" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "namespaces"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "kube-system"

After a chat with Janis we decided to manually (i.e. via kubectl edit namespace etc..) to add the following bits:

metadata:
+  annotations:
+    meta.helm.sh/release-name: namespaces
+    meta.helm.sh/release-namespace: kube-system
+    net.beta.kubernetes.io/network-policy: '{"ingress":{"isolation":"DefaultDeny"}}'
...
  labels:
    app: raw
+   app.kubernetes.io/managed-by: Helm

The above (for both knative-serving and istio-system namespaces) allowed the sync to succeed. The assumption is that it was due to the namespaces being already created, in theory it shouldn't happen when bootstrapping a cluster (last famous words).

The kfserving-system namespace is currently created by the kubeflow-kfserving chart, and carries the following labels:

labels:
  control-plane: kfserving-controller-manager
  controller-tools.k8s.io: "1.0"
  istio-injection: disabled

We should probably add some code to our helmfile logic to allow the definition of custom labels like the above. Once done, we will add kfserving-system to the system "managed" namespaces and we'll remove the Namespace creation from the chart.

Change 721268 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kubeflow-kfserving: move Namespace creation to helmfile

https://gerrit.wikimedia.org/r/721268

Update: in theory https://gerrit.wikimedia.org/r/721268 (plus its related changes) should fix the remaining kfserving-system settings, and take care of the per-service namespaces as well.

Change 721268 merged by Elukey:

[operations/deployment-charts@master] kubeflow-kfserving: move Namespace creation to helmfile

https://gerrit.wikimedia.org/r/721268

elukey claimed this task.

I was finally able to add kfserving-system to the list of managed namespaces in helmfile, rather than relying on the config in the Chart (provided by upstream). This allows us to manage the PSP/RBAC permissions properly.

We are working in T286791 for the custom model namespaces permissions (that will be deployed via helm3) so I think that this task can be closed!