We have been experimenting with Apache Kyuubi to allow analytics users to create ephemeral spark clusters on Kubernetes and provide a thrift compatible JDBC interface, for use with dbt.
See: T410017: Provide a Spark-on-k8s access for sql tools (dbt) for this work to-date.
However, we have decided that our best course of action would be to set up a central Kyuubi service for each analytics-enabled namespace on the Kubernetes cluster.
The reason for this is that we need to enable Kerberos for Kyuubi]] itself, when working with a Kerberized Hadoop cluster. From this page...
If you are deploying Kyuubi with a kerberized Hadoop cluster, it is strongly recommended that kyuubi.authentication should be set to KERBEROS too.
We did experiment with deployment models where Kyuubi was not kerberized, although Hive Metastore and HDFS were, but could not get this configuration to work.
In addition to this, our Kerberos implementation is not configured in such a way that it will work with arbitrarily named and ephemeral pods that provide a kerberized service. This is due to the requirement for forward and reverse DNS to match the FQDN part of the Kerberos principal.
Therefore, we have decided to create a single Kyuubi deployment in each of the spark-enabled namespaces, which will have:
- a static pod name.
- a service name matching the pod name.
- a kerberos principal with a hostname component that matches the pod name.
We wish to use the CONNECTION share level: https://kyuubi.readthedocs.io/en/master/deployment/engine_share_level.html#connection
This will create a spark cluster for each incoming thrift server connection and destroy the cluster when the connection is terminated.
We will likely also want to use the Hadoop impersonation mechanism: https://kyuubi.readthedocs.io/en/master/security/kerberos.html#enable-hadoop-impersonation
...although to begin with the kyuubi server may also be running as the same (Kerberos) user that we want to impersonate.
There is an upstream chart for Kyuubi here: https://github.com/apache/kyuubi/tree/master/charts/kyuubi
...but we do not expect that this would bring any benefit over adding Kyuubi server support to our existing spark-support chart.
Nevertheless, this may be useful as a reference point.