As part of the tesing process for running Spark jobs on Kubernetes, we need to be able to deploy a spark-operator into its own namespace.
Once the operator is running it is configured to monitor one or more namespaces for SparkApplication requests.
This can also be omitted, so that it watches all namespaces.
At this point in the process, I believe that we should initially create:
- a spark-operator namespace where we run the operator
- a spark namespace where we run the driver
- a spark-operator user
- a spark user
The spark-operator will use the standard deployment-pipeline and be managed by SREs in the Data Engineering and ML teams using helmfile.
The spark jobs will be submitted by members of analytics-privatedata-users.