Create namespaces and kubernetes users for spark-operator and for spark jobs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	BTullis
	Oct 26 2022, 1:11 PM

Description

As part of the tesing process for running Spark jobs on Kubernetes, we need to be able to deploy a spark-operator into its own namespace.

Once the operator is running it is configured to monitor one or more namespaces for SparkApplication requests.
This can also be omitted, so that it watches all namespaces.

At this point in the process, I believe that we should initially create:

a spark-operator namespace where we run the operator
a spark namespace where we run the driver

a spark-operator user
a spark user

The spark-operator will use the standard deployment-pipeline and be managed by SREs in the Data Engineering and ML teams using helmfile.

The spark jobs will be submitted by members of analytics-privatedata-users.

Details

Subject	Repo	Branch	Lines +/-
Configure the kube_env file for the spark-operator namespace	operations/puppet	production	+4 -0
Add namespaces for spark and spark-operator	operations/deployment-charts	master	+4 -0
Add dummy deployment users and tokens for spark-operator and spark	labs/private	master	+12 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
In Progress	None	T318712 Enable spark jobs on the dse-k8s cluster via the spark-operator
Resolved	BTullis	T318926 Deploy spark-operator to the dse-k8s cluster
Resolved	BTullis	T321686 Create namespaces and kubernetes users for spark-operator and for spark jobs

Event Timeline

BTullis created this task.Oct 26 2022, 1:11 PM

BTullis mentioned this in T321682: Create kubernetes namespace and user for the stream_enrichment PoC project.Oct 26 2022, 1:28 PM

BTullis triaged this task as High priority.Oct 26 2022, 1:31 PM

BTullis moved this task from Next Up to In Progress on the Shared-Data-Infrastructure (Sprint 03) board.

BTullis added subscribers: JMeybohm, Joe.Oct 26 2022, 1:51 PM

@BTullis What about the spark-executors? Will they run in the same way as the spark driver?

Change 849558 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Add dummy deployment users and tokens for spark-operator and spark

https://gerrit.wikimedia.org/r/849558

gerritbot added a project: Patch-For-Review.Oct 26 2022, 1:53 PM

In T321686#8345926, @JAllemandou wrote:

@BTullis What about the spark-executors? Will they run in the same way as the spark driver?

@JAllemandou - Yes, I believe that they will run in the same way, althouth there is additional flexibility if we wish it.

Here is the test case that I have been running so far: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-pi.yaml

The spark-operator has various configuration options about which namespaces and service accounts to use.
In the simple case that I've been running so far on minikube, the operator, driver, and executor pods have all been running in the same namespace.
You can see this test in action here, with one pod of each type.

When moving this from minikube to the real kubernetes cluster, I would separate out the spark-operator, so that it runs in its own namespace and uses its own service account.

I don't currently see a need to run the drive and executors with different users or in different namespaces from each other though.
It seems that they're both effectively 'user-level' processes, so it makes more sense to me to group them together.

However, If we do decide to run them under different service accounts, I believe we can do that. See:

The next stage in my test is to run the same example, but substitute the docker images for those that we have built ourselves. i.e.

gcr.io/spark-operator/spark:v3.1.1 -> docker-registry.wikimedia.org/spark:3.3.0
ghcr.io/googlecloudplatform/spark-operator:v1beta2-1.3.7-3.1.1 -> v1beta2-1.3.7-3.1.1

I am aware of the discrepancy between the 3.1.1 and 3.3.0 version numbers.
So far I've had trouble building a Spark 3.1.1 distribution against Hadoop 2.10.2, whereas 3.3.0 is working. However, I'm watching this closely. https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1559

Change 849558 merged by Btullis:

[labs/private@master] Add dummy deployment users and tokens for spark-operator and spark

https://gerrit.wikimedia.org/r/849558

BTullis mentioned this in rLPRI10e8bf66b582: Add dummy deployment users and tokens for spark-operator and spark.Oct 26 2022, 2:55 PM

Thanks for the details @BTullis :)
I expect no big difference in running a spark-pi example with 3.3 or 3.1.
The kubernetes integration on the other hand could be something that has changed.

Maintenance_bot removed a project: Patch-For-Review.Oct 26 2022, 3:30 PM

I have now created the tokens in the private puppet repository to match those in the labs/private repository.

I'll now make a CR to add the namespaces and follow it up the the changes to configure the kubectl environment files.

• EChetty moved this task from Sprint 03 to EQ2 Kanban (Sprints 04-07) on the Shared-Data-Infrastructure board.Nov 8 2022, 10:39 AM

• EChetty edited projects, added Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)); removed Shared-Data-Infrastructure (Sprint 03).

• EChetty moved this task from Next Up to In Progress on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.

Change 854498 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add namespaces for spark and spark-operator

https://gerrit.wikimedia.org/r/854498

gerritbot added a project: Patch-For-Review.Nov 8 2022, 11:44 AM

Change 854505 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Configure the kube_env file for the spark-operator namespace

https://gerrit.wikimedia.org/r/854505

BTullis moved this task from In Progress to In Review on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.Nov 8 2022, 12:17 PM

Change 854498 merged by jenkins-bot:

[operations/deployment-charts@master] Add namespaces for spark and spark-operator

https://gerrit.wikimedia.org/r/854498

This is now deployed.

BTullis closed this task as Resolved.Nov 24 2022, 5:16 PM

Change 854505 merged by Btullis: