Page MenuHomePhabricator

Define necessary RBAC rules for spark on dse-k8s cluster
Closed, ResolvedPublic

Description

In order to be able to use the functionality of the spark-operator, we will need to tune the RBAC (role based access control) policies somewhat.

Specifically, we now the following:

  • The spark-operator pod(s) will need to be able to launch spark-driver pods in the spark namespace.
  • The spark-driver pods will need to launch spark-executor pods in the spark namespace.
  • The webhook of the spark-operator will require permissions to modify pods in the spark namespace.

We will also need to look closely at the way that upstream helm chart works, and ensure that we are not exceeding the minimum permissions required for our purposes.

Event Timeline

BTullis triaged this task as High priority.Nov 8 2022, 2:00 PM
BTullis created this task.
BTullis added subscribers: JMeybohm, akosiaris.

I believe that this is now ready for review. The spark-operator deployment has been specified as part of the admin_ng section, so it may only be deployed by an SRE.

Firstly, a ServiceAccount for spark is created.

+ # Source: spark-operator/templates/spark-serviceaccount.yaml
+ apiVersion: v1
+ kind: ServiceAccount
+ metadata:
+   name: spark
+   namespace: spark
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm

A Role is created for spark in the spark namespace.

+ # Source: spark-operator/templates/spark-rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: Role
+ metadata:
+   name: spark-role
+   namespace: spark
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ rules:
+ - apiGroups: [""]
+   resources: ["pods"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["services"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["configmaps"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["persistentvolumeclaims"]
+   verbs: ["*"]

A RoleBinding is created in the spark namespace, tying the spark serviceaccount to the spark role.

+ # Source: spark-operator/templates/spark-rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: RoleBinding
+ metadata:
+   name: spark
+   namespace: spark
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+ - kind: ServiceAccount
+   name: spark
+   namespace: spark
+ roleRef:
+   kind: Role
+   name: spark-role
+   apiGroup: rbac.authorization.k8s.io

These three objects are all in the spark namespace.

Next, a ServiceAccountfor the spark-operator is created.

+ ---
+ # Source: spark-operator/templates/serviceaccount.yaml
+ apiVersion: v1
+ kind: ServiceAccount
+ metadata:
+   name: spark-operator
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm

Then a ClusterRole for the spark-operator is created, with the following permissions:

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRole
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ rules:
+ - apiGroups: [""]
+   resources: ["pods"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["services", "configmaps", "secrets"]
+   verbs: ["create", "get", "delete", "update"]
+ - apiGroups: ["extensions", "networking.k8s.io"]
+   resources: ["ingresses"]
+   verbs: ["create", "get", "delete"]
+ - apiGroups: [""]
+   resources: ["nodes"]
+   verbs: ["get"]
+ - apiGroups: [""]
+   resources: ["events"]
+   verbs: ["create", "update", "patch"]
+ - apiGroups: [""]
+   resources: ["resourcequotas"]
+   verbs: ["get", "list", "watch"]
+ - apiGroups: ["apiextensions.k8s.io"]
+   resources: ["customresourcedefinitions"]
+   verbs: ["create", "get", "update", "delete"]
+ - apiGroups: ["admissionregistration.k8s.io"]
+   resources: ["mutatingwebhookconfigurations", "validatingwebhookconfigurations"]
+   verbs: ["create", "get", "update", "delete"]
+ - apiGroups: ["sparkoperator.k8s.io"]
+   resources: ["sparkapplications", "sparkapplications/status", "scheduledsparkapplications", "scheduledsparkapplications/status"]
+   verbs: ["*"]
+ - apiGroups: ["scheduling.incubator.k8s.io", "scheduling.sigs.dev", "scheduling.volcano.sh"]
+   resources: ["podgroups"]
+   verbs: ["*"]
+ - apiGroups: ["batch"]
+   resources: ["jobs"]
+   verbs: ["delete"]

Lastly, in terms of these RBAC rules, a ClusterRoleBinding is created. This specifies the namespace spark-operator so it limits the power of the spark-operator to work within its own namespace. (Edit: incorrect, see comment below)

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRoleBinding
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+   - kind: ServiceAccount
+     name: spark-operator
+     namespace: spark-operator
+ roleRef:
+   kind: ClusterRole
+   name: spark-operator-spark-data-engineering
+   apiGroup: rbac.authorization.k8s.io

I will discuss the other resources in the parent ticket.

[...]
Lastly, in terms of these RBAC rules, a ClusterRoleBinding is created. This specifies the namespace spark-operator so it limits the power of the spark-operator to work within its own namespace.

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRoleBinding
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+   - kind: ServiceAccount
+     name: spark-operator
+     namespace: spark-operator
+ roleRef:
+   kind: ClusterRole
+   name: spark-operator-spark-data-engineering
+   apiGroup: rbac.authorization.k8s.io

I will discuss the other resources in the parent ticket.

ClusterRoleBindings can not be bound/restricted to a namespace (RoleBindings can in terms of must), so this binding will grant all the permissions defined in the ClusterRole spark-operator-spark-data-engineering to a ServiceAccount spark-operator in the Namespace spark-operator.

ClusterRoleBindings can not be bound/restricted to a namespace (RoleBindings can in terms of must), so this binding will grant all the permissions defined in the ClusterRole spark-operator-spark-data-engineering to a ServiceAccount spark-operator in the Namespace spark-operator.

Oh, I see now, thanks @JMeybohm for explaining. Right, so I understand now why limiting the power of the spark-operator is going to be harder than I first thought.

OK, so there are several types of resources mentioned in the spark-operator-spark-data-engineering ClusterRole that are cluster-wide, like nodes, customresourcedefinitions, mutatingwebhookconfigurations etc.
These permission will have to remain in a ClusterRole definition and be bound with a ClusterRoleBinding, but it can be made less permissive.

For the other types of resources such as pods, services,events etc. which are all namespaced, I could remove them from this ClusterRole and CusterRoleBinding and specify additional Role and a RoleBindingdefinitions within the two namespaces in which I'm interested. Would this approach be workable, do you think?

I've had a go at this approach, so now the ClusterRole for spark-operator is defined like this: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/8/charts/spark-operator/templates/rbac-cluster.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-operator-spark-data-engineering-operator-cluster-role
  annotations:
    "helm.sh/hook": pre-install, pre-upgrade
    "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
    "helm.sh/hook-weight": "-10"
  labels:
    app: spark-operator
    chart: spark-operator-0.0.1
    release: spark-data-engineering
    heritage: Helm
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get"]
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["create", "get", "update", "delete"]
  - apiGroups: ["admissionregistration.k8s.io"]
    resources: ["mutatingwebhookconfigurations", "validatingwebhookconfigurations"]
    verbs: ["create", "get", "update", "delete"]

All of the remaining privileges are those that are namespaced, I believe, so they are defined in a Role like this: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/8/charts/spark-operator/templates/rbac-namespaced.yml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spark-operator-spark-data-engineering-operator-role
  namespace: spark-operator
  annotations:
    "helm.sh/hook": pre-install, pre-upgrade
    "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
    "helm.sh/hook-weight": "-10"
  labels:
    app: spark-operator
    chart: spark-operator-0.0.1
    release: spark-data-engineering
    heritage: Helm
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["*"]
  - apiGroups: [""]
    resources: ["services", "configmaps", "secrets"]
    verbs: ["create", "get", "delete", "update"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["create", "get", "delete"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["resourcequotas"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["sparkoperator.k8s.io"]
    resources: ["sparkapplications", "sparkapplications/status", "scheduledsparkapplications", "scheduledsparkapplications/status"]
    verbs: ["*"]
  - apiGroups: ["scheduling.incubator.k8s.io", "scheduling.sigs.dev", "scheduling.volcano.sh"]
    resources: ["podgroups"]
    verbs: ["*"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["delete"]

That sounds about right. Although you absolutely can specify the latter rules as a ClusterRole object and than bind it to just a namespace using a RoleBinding (instead of a ClusterRoleBinding). That way you don't have to declare the Role in every Namespace you want to use (just the RoleBinding).

Change 855674 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add a spark-operator chart and helmfile configuraiton

https://gerrit.wikimedia.org/r/855674

That sounds about right. Although you absolutely can specify the latter rules as a ClusterRole object and than bind it to just a namespace using a RoleBinding (instead of a ClusterRoleBinding). That way you don't have to declare the Role in every Namespace you want to use (just the RoleBinding).

Ah, thanks. Good to know that we can bind a ClusterRole with a Rolebinding - I hadn't thought of that.
I think that for now I'll leave it with this Role setup for the next phase of testing, given that it's only the one namespace for now. If it look like we will want to run spark jobs in more namespaces in future, I'll definitely come back to this approach.

I've updated the latest patchset on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674 with this approach.

I'm re-writing these to take account of the improvements to the helm chart templates brought about by the new organization of templates.

The spark-operator pod(s) will need to be able to launch spark-driver pods in the spark namespace.

Hm interesting. Thinking of how this might work for Flink, where I'd expect each Flink application to be launched in its own namespace, just as we do for any other k8s application. If we were to deploy a 'production' Spark Streaming application, I'd expect the same. Does this mean that will not be possible, and all Spark jobs will run in the same namespace?

Or, is this just the way it will work for ad-hoc Spark queries and batch jobs?

Change 887994 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the kubectl config files generated for the dse-k8s cluster

https://gerrit.wikimedia.org/r/887994

Change 887994 merged by Btullis:

[operations/puppet@production] Update the kubectl config files generated for the dse-k8s cluster

https://gerrit.wikimedia.org/r/887994

We think that this is now complete, although testing will be done as part of T318926: Deploy spark-operator to the dse-k8s cluster imminently.

Change 855674 merged by jenkins-bot:

[operations/deployment-charts@master] Add a spark-operator chart and helmfile configuration

https://gerrit.wikimedia.org/r/855674

This is now done, although I am sure that we will revisit the permissions over time.