Define necessary RBAC rules for spark on dse-k8s cluster
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	BTullis
	Nov 8 2022, 2:00 PM

Description

In order to be able to use the functionality of the spark-operator, we will need to tune the RBAC (role based access control) policies somewhat.

Specifically, we now the following:

The spark-operator pod(s) will need to be able to launch spark-driver pods in the spark namespace.
The spark-driver pods will need to launch spark-executor pods in the spark namespace.
The webhook of the spark-operator will require permissions to modify pods in the spark namespace.

We will also need to look closely at the way that upstream helm chart works, and ensure that we are not exceeding the minimum permissions required for our purposes.

Details

	Subject	Repo	Branch	Lines +/-
	Add a spark-operator chart and helmfile configuration	operations/deployment-charts	master	+9 K -2
	Update the kubectl config files generated for the dse-k8s cluster	operations/puppet	production	+3 -9

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
In Progress	None	T318712 Enable spark jobs on the dse-k8s cluster via the spark-operator
Resolved	BTullis	T318926 Deploy spark-operator to the dse-k8s cluster
Resolved	BTullis	T322635 Define necessary RBAC rules for spark on dse-k8s cluster

Event Timeline

BTullis triaged this task as High priority.Nov 8 2022, 2:00 PM

BTullis created this task.

BTullis added subscribers: JMeybohm, akosiaris.

• EChetty moved this task from Next Up to In Progress on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.Nov 8 2022, 2:11 PM

I believe that this is now ready for review. The spark-operator deployment has been specified as part of the admin_ng section, so it may only be deployed by an SRE.

Firstly, a ServiceAccount for spark is created.

+ # Source: spark-operator/templates/spark-serviceaccount.yaml
+ apiVersion: v1
+ kind: ServiceAccount
+ metadata:
+   name: spark
+   namespace: spark
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm

A Role is created for spark in the spark namespace.

+ # Source: spark-operator/templates/spark-rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: Role
+ metadata:
+   name: spark-role
+   namespace: spark
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ rules:
+ - apiGroups: [""]
+   resources: ["pods"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["services"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["configmaps"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["persistentvolumeclaims"]
+   verbs: ["*"]

A RoleBinding is created in the spark namespace, tying the spark serviceaccount to the spark role.

+ # Source: spark-operator/templates/spark-rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: RoleBinding
+ metadata:
+   name: spark
+   namespace: spark
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+ - kind: ServiceAccount
+   name: spark
+   namespace: spark
+ roleRef:
+   kind: Role
+   name: spark-role
+   apiGroup: rbac.authorization.k8s.io

These three objects are all in the spark namespace.

Next, a ServiceAccountfor the spark-operator is created.

+ ---
+ # Source: spark-operator/templates/serviceaccount.yaml
+ apiVersion: v1
+ kind: ServiceAccount
+ metadata:
+   name: spark-operator
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm

Then a ClusterRole for the spark-operator is created, with the following permissions:

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRole
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ rules:
+ - apiGroups: [""]
+   resources: ["pods"]
+   verbs: ["*"]
+ - apiGroups: [""]
+   resources: ["services", "configmaps", "secrets"]
+   verbs: ["create", "get", "delete", "update"]
+ - apiGroups: ["extensions", "networking.k8s.io"]
+   resources: ["ingresses"]
+   verbs: ["create", "get", "delete"]
+ - apiGroups: [""]
+   resources: ["nodes"]
+   verbs: ["get"]
+ - apiGroups: [""]
+   resources: ["events"]
+   verbs: ["create", "update", "patch"]
+ - apiGroups: [""]
+   resources: ["resourcequotas"]
+   verbs: ["get", "list", "watch"]
+ - apiGroups: ["apiextensions.k8s.io"]
+   resources: ["customresourcedefinitions"]
+   verbs: ["create", "get", "update", "delete"]
+ - apiGroups: ["admissionregistration.k8s.io"]
+   resources: ["mutatingwebhookconfigurations", "validatingwebhookconfigurations"]
+   verbs: ["create", "get", "update", "delete"]
+ - apiGroups: ["sparkoperator.k8s.io"]
+   resources: ["sparkapplications", "sparkapplications/status", "scheduledsparkapplications", "scheduledsparkapplications/status"]
+   verbs: ["*"]
+ - apiGroups: ["scheduling.incubator.k8s.io", "scheduling.sigs.dev", "scheduling.volcano.sh"]
+   resources: ["podgroups"]
+   verbs: ["*"]
+ - apiGroups: ["batch"]
+   resources: ["jobs"]
+   verbs: ["delete"]

Lastly, in terms of these RBAC rules, a ClusterRoleBinding is created. This specifies the namespace spark-operator so ~~it limits the power of the spark-operator to work within its own namespace~~. (Edit: incorrect, see comment below)

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRoleBinding
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+   - kind: ServiceAccount
+     name: spark-operator
+     namespace: spark-operator
+ roleRef:
+   kind: ClusterRole
+   name: spark-operator-spark-data-engineering
+   apiGroup: rbac.authorization.k8s.io

I will discuss the other resources in the parent ticket.

In T322635#8397044, @BTullis wrote:

[...]
Lastly, in terms of these RBAC rules, a ClusterRoleBinding is created. This specifies the namespace spark-operator so it limits the power of the spark-operator to work within its own namespace.

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRoleBinding
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+   - kind: ServiceAccount
+     name: spark-operator
+     namespace: spark-operator
+ roleRef:
+   kind: ClusterRole
+   name: spark-operator-spark-data-engineering
+   apiGroup: rbac.authorization.k8s.io

I will discuss the other resources in the parent ticket.

ClusterRoleBindings can not be bound/restricted to a namespace (RoleBindings can in terms of must), so this binding will grant all the permissions defined in the ClusterRole spark-operator-spark-data-engineering to a ServiceAccount spark-operator in the Namespace spark-operator.

BTullis mentioned this in T318926: Deploy spark-operator to the dse-k8s cluster.Nov 16 2022, 11:18 AM

ClusterRoleBindings can not be bound/restricted to a namespace (RoleBindings can in terms of must), so this binding will grant all the permissions defined in the ClusterRole spark-operator-spark-data-engineering to a ServiceAccount spark-operator in the Namespace spark-operator.

Oh, I see now, thanks @JMeybohm for explaining. Right, so I understand now why limiting the power of the spark-operator is going to be harder than I first thought.

OK, so there are several types of resources mentioned in the spark-operator-spark-data-engineering ClusterRole that are cluster-wide, like nodes, customresourcedefinitions, mutatingwebhookconfigurations etc.
These permission will have to remain in a ClusterRole definition and be bound with a ClusterRoleBinding, but it can be made less permissive.

For the other types of resources such as pods, services,events etc. which are all namespaced, I could remove them from this ClusterRole and CusterRoleBinding and specify additional Role and a RoleBindingdefinitions within the two namespaces in which I'm interested. Would this approach be workable, do you think?

I've had a go at this approach, so now the ClusterRole for spark-operator is defined like this: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/8/charts/spark-operator/templates/rbac-cluster.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-operator-spark-data-engineering-operator-cluster-role
  annotations:
    "helm.sh/hook": pre-install, pre-upgrade
    "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
    "helm.sh/hook-weight": "-10"
  labels:
    app: spark-operator
    chart: spark-operator-0.0.1
    release: spark-data-engineering
    heritage: Helm
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get"]
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["create", "get", "update", "delete"]
  - apiGroups: ["admissionregistration.k8s.io"]
    resources: ["mutatingwebhookconfigurations", "validatingwebhookconfigurations"]
    verbs: ["create", "get", "update", "delete"]

All of the remaining privileges are those that are namespaced, I believe, so they are defined in a Role like this: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/8/charts/spark-operator/templates/rbac-namespaced.yml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spark-operator-spark-data-engineering-operator-role
  namespace: spark-operator
  annotations:
    "helm.sh/hook": pre-install, pre-upgrade
    "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
    "helm.sh/hook-weight": "-10"
  labels:
    app: spark-operator
    chart: spark-operator-0.0.1
    release: spark-data-engineering
    heritage: Helm
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["*"]
  - apiGroups: [""]
    resources: ["services", "configmaps", "secrets"]
    verbs: ["create", "get", "delete", "update"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["create", "get", "delete"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["resourcequotas"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["sparkoperator.k8s.io"]
    resources: ["sparkapplications", "sparkapplications/status", "scheduledsparkapplications", "scheduledsparkapplications/status"]
    verbs: ["*"]
  - apiGroups: ["scheduling.incubator.k8s.io", "scheduling.sigs.dev", "scheduling.volcano.sh"]
    resources: ["podgroups"]
    verbs: ["*"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["delete"]

That sounds about right. Although you absolutely can specify the latter rules as a ClusterRole object and than bind it to just a namespace using a RoleBinding (instead of a ClusterRoleBinding). That way you don't have to declare the Role in every Namespace you want to use (just the RoleBinding).

Change 855674 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add a spark-operator chart and helmfile configuraiton

https://gerrit.wikimedia.org/r/855674

gerritbot added a project: Patch-For-Review.Nov 17 2022, 1:25 PM

In T322635#8401382, @JMeybohm wrote:

That sounds about right. Although you absolutely can specify the latter rules as a ClusterRole object and than bind it to just a namespace using a RoleBinding (instead of a ClusterRoleBinding). That way you don't have to declare the Role in every Namespace you want to use (just the RoleBinding).

Ah, thanks. Good to know that we can bind a ClusterRole with a Rolebinding - I hadn't thought of that.
I think that for now I'll leave it with this Role setup for the next phase of testing, given that it's only the one namespace for now. If it look like we will want to run spark jobs in more namespaces in future, I'll definitely come back to this approach.

I've updated the latest patchset on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674 with this approach.

BTullis moved this task from In Review to In Progress on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.Nov 24 2022, 12:14 PM

I'm re-writing these to take account of the improvements to the helm chart templates brought about by the new organization of templates.

BTullis moved this task from In Progress to In Review on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.Nov 28 2022, 10:59 AM

BTullis moved this task from In Review to In Progress on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.Nov 29 2022, 3:29 PM

The spark-operator pod(s) will need to be able to launch spark-driver pods in the spark namespace.

Hm interesting. Thinking of how this might work for Flink, where I'd expect each Flink application to be launched in its own namespace, just as we do for any other k8s application. If we were to deploy a 'production' Spark Streaming application, I'd expect the same. Does this mean that will not be possible, and all Spark jobs will run in the same namespace?

Or, is this just the way it will work for ad-hoc Spark queries and batch jobs?

Ottomata added a subscriber: bking.Dec 2 2022, 4:14 PM

BTullis moved this task from In Progress to In Review on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.Dec 5 2022, 10:45 PM

• EChetty moved this task from EQ2 Kanban (Sprints 04-07) to 2022-23 Q4 Wrap up on the Shared-Data-Infrastructure board.Feb 7 2023, 2:59 PM

• EChetty edited projects, added Shared-Data-Infrastructure (2022-23 Q4 Wrap up); removed Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)).

• EChetty moved this task from Next Up to In Review on the Shared-Data-Infrastructure (2022-23 Q4 Wrap up) board.

Change 887994 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the kubectl config files generated for the dse-k8s cluster

https://gerrit.wikimedia.org/r/887994

Change 887994 merged by Btullis:

[operations/puppet@production] Update the kubectl config files generated for the dse-k8s cluster

https://gerrit.wikimedia.org/r/887994

• nfraison mentioned this in T327257: DSE Experiment - User Story 1 (Address Kerberos).Feb 21 2023, 1:39 PM

• nfraison mentioned this in T330162: Research and test methods for accessing kerberized services from spark running on the DSE K8S cluster.Feb 21 2023, 2:34 PM

We think that this is now complete, although testing will be done as part of T318926: Deploy spark-operator to the dse-k8s cluster imminently.

Change 855674 merged by jenkins-bot:

[operations/deployment-charts@master] Add a spark-operator chart and helmfile configuration

https://gerrit.wikimedia.org/r/855674

Maintenance_bot removed a project: Patch-For-Review.Mar 8 2023, 2:10 PM

This is now done, although I am sure that we will revisit the permissions over time.

Define necessary RBAC rules for spark on dse-k8s clusterClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Define necessary RBAC rules for spark on dse-k8s cluster
Closed, ResolvedPublic
Actions

Related Objects
Search...