Page MenuHomePhabricator

Deploy Kyuubi to enable dynamic spark-sql clusters in dse-k8s
Open, HighPublic

Description

We have been experimenting with Apache Kyuubi to allow analytics users to create ephemeral spark clusters on Kubernetes and provide a thrift compatible JDBC interface, for use with dbt.

See: T410017: Provide a Spark-on-k8s access for sql tools (dbt) for this work to-date.

However, we have decided that our best course of action would be to set up a central Kyuubi service for each analytics-enabled namespace on the Kubernetes cluster.
The reason for this is that we need to enable Kerberos for Kyuubi]] itself, when working with a Kerberized Hadoop cluster. From this page...

If you are deploying Kyuubi with a kerberized Hadoop cluster, it is strongly recommended that kyuubi.authentication should be set to KERBEROS too.

We did experiment with deployment models where Kyuubi was not kerberized, although Hive Metastore and HDFS were, but could not get this configuration to work.
In addition to this, our Kerberos implementation is not configured in such a way that it will work with arbitrarily named and ephemeral pods that provide a kerberized service. This is due to the requirement for forward and reverse DNS to match the FQDN part of the Kerberos principal.

Therefore, we have decided to create a single Kyuubi deployment in each of the spark-enabled namespaces, which will have:

  • a static pod name.
  • a service name matching the pod name.
  • a kerberos principal with a hostname component that matches the pod name.

We wish to use the CONNECTION share level: https://kyuubi.readthedocs.io/en/master/deployment/engine_share_level.html#connection
This will create a spark cluster for each incoming thrift server connection and destroy the cluster when the connection is terminated.

We will likely also want to use the Hadoop impersonation mechanism: https://kyuubi.readthedocs.io/en/master/security/kerberos.html#enable-hadoop-impersonation
...although to begin with the kyuubi server may also be running as the same (Kerberos) user that we want to impersonate.

There is an upstream chart for Kyuubi here: https://github.com/apache/kyuubi/tree/master/charts/kyuubi
...but we do not expect that this would bring any benefit over adding Kyuubi server support to our existing spark-support chart.
Nevertheless, this may be useful as a reference point.

Event Timeline

Change #1224145 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add a kyuubi service to the spark-support chart

https://gerrit.wikimedia.org/r/1224145

Change #1224145 merged by jenkins-bot:

[operations/deployment-charts@master] Add a kyuubi service to the spark-support chart

https://gerrit.wikimedia.org/r/1224145

I have started working on the modifications to the spark-support chart to add a Kyuubi service, using the upstream chart for inspiration.

One interesting point of note is that the upstream chart uses a StatefulSet instead of a Deployment for the main kyuubi service. The benefit of this is that the pod names no longer have a random string as a suffix, but instead use an ordinal representation. This is helpful because I can create a kerberos principal that uses this fixed name. i.e.

btullis@krb1002:~$ sudo kadmin.local addprinc -randkey analytics/kyuubi-0.kyuubi-headless.analytics-test.svc.cluster.local@WIKIMEDIA

I can then create a keytab for this principal.

btullis@krb1002:~$ sudo mkdir -p /srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics

btullis@krb1002:~$ sudo kadmin.local ktadd -norandkey -k /srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics/analytics.keytab analytics/analytics-test.discovery.wmnet@WIKIMEDIA
Entry for principal analytics/analytics-test.discovery.wmnet@WIKIMEDIA with kvno 1, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics/analytics.keytab.
btullis@krb1002:~$ sudo kadmin.local ktadd -norandkey -k /srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics/analytics.keytab analytics/kyuubi-0.kyuubi-headless.analytics-test.svc.cluster.local@WIKIMEDIA
Entry for principal analytics/kyuubi-0.kyuubi-headless.analytics-test.svc.cluster.local@WIKIMEDIA with kvno 1, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics/analytics.keytab.

I can then check that both of the principals that I wish to use are contained within this keytab.

btullis@krb1002:~$ sudo klist -kt /srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics/analytics.keytab
Keytab name: FILE:/srv/kerberos/keytabs/kyuubi-headless.analytics-test.svc.cluster.local/analytics/analytics.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 01/08/2026 11:20:07 analytics/analytics-test.discovery.wmnet@WIKIMEDIA
   1 01/08/2026 11:20:20 analytics/kyuubi-0.kyuubi-headless.analytics-test.svc.cluster.local@WIKIMEDIA

If we want to run several kyuubi instances in high-availability mode, then we can add futher principals and add them to the same keytab.

However, in order to make use of this feature, we would have to provide a production JDBC database for the Kyuubi metadata, instead of a local sqlite directory.
The value of kyuubi.metadata.store.jdbc.url is currently set to: jdbc:sqlite:/tmp/kyuubi_state_store.db but the docs for that parameter state here:

The JDBC url for server JDBC metadata store. By default, it is a SQLite database url, and the state information is not shared across Kyuubi instances. To enable high availability for multiple kyuubi instances, please specify a production JDBC url. Note: this value support the variables substitution: <KYUUBI_HOME>.

Change #1224636 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Update the zookeeper address for analytics-test

https://gerrit.wikimedia.org/r/1224636

Change #1224636 merged by jenkins-bot:

[operations/deployment-charts@master] Update the zookeeper address for analytics-test

https://gerrit.wikimedia.org/r/1224636

Change #1224661 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add a networkpolicy to spark-support permitting access to kyuubi

https://gerrit.wikimedia.org/r/1224661

Change #1224661 merged by jenkins-bot:

[operations/deployment-charts@master] Add a networkpolicy to spark-support permitting access to kyuubi

https://gerrit.wikimedia.org/r/1224661

Change #1224682 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Fix the kyuubi-headless service definition

https://gerrit.wikimedia.org/r/1224682

Change #1224682 merged by jenkins-bot:

[operations/deployment-charts@master] Fix the kyuubi-headless service definition

https://gerrit.wikimedia.org/r/1224682

Change #1224694 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Fix malformed networkpolicy for spark-support and kyuubi

https://gerrit.wikimedia.org/r/1224694

Change #1224984 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Configure the kyuubi-defaults.conf file with kerberos details

https://gerrit.wikimedia.org/r/1224984

Change #1224694 merged by jenkins-bot:

[operations/deployment-charts@master] Fix malformed networkpolicy for spark-support and kyuubi

https://gerrit.wikimedia.org/r/1224694

Change #1224984 merged by jenkins-bot:

[operations/deployment-charts@master] Configure the kyuubi-defaults.conf file with kerberos details

https://gerrit.wikimedia.org/r/1224984

BTullis removed BTullis as the assignee of this task.EditedTue, Feb 24, 2:28 PM

Moving this back to the Data-Platform-SRE backlog, because although we have put significant time into this, the kyuubi deployment is not complete.
We have a kyuubi deployment that we can continue to test, but it does not yet facilitate the creation of on-demand spark-sql clusters.