Page MenuHomePhabricator

DSE Experiment - User Story 1 (Address Kerberos)
Closed, ResolvedPublic

Description

User Story

As a data scientist, data engineer, or machine learning analyst, I want to be able to access an HDFS cluster that uses Kerberos from a Kubernetes cluster so that I can easily access and analyse large datasets stored in HDFS without having to manually configure authentication and authorisation.

Tasks

Acceptance Criteria

  • The person should be able to access and load data stored in HDFS from a K8 Pod.
  • The user should be able to do this securely and with minimal configuration.

Outstanding Questions:

  • Can a user use their existing credentials or will it be a shared key?
  • Is there a sufficiently robust audit trail?

Event Timeline

Side note for my comprehension of our current setup:

BTullis claimed this task.
BTullis subscribed.

I'm closing this ticket, since the work has now been done.

We have proven that we can access Kerberised services such as HDFS and Presto from Kubernetes.
We have the Spark History Server deployed to production, which reads from HDFS. In addition we have Suuperset, which reads from Presto.

Both of these make use of kerberos-kinit as a sidecar pod for keytab renewal.