Page MenuHomePhabricator

Write a design document relating to superset on dse-k8s
Closed, ResolvedPublic

Description

We wish to migrate our Superset instances from bare-metal and VMs to the dse-k8s cluster.

This task is about writing a lightweight design document that decribes the deployment and sets out the specific elements that must be in place by the end of the project.

Note that we are not planning to migrate the metadata database into Kubernetes at this time.

Acceptance criteria
  • A Google Doc describing the superset on k8s deployments exists and has been shared with all stakehoders for review.

Event Timeline

BTullis claimed this task.
BTullis moved this task from Incoming to Quarterly Goals on the Data-Platform-SRE board.
BTullis triaged this task as Medium priority.Oct 25 2023, 3:29 PM
BTullis moved this task from Quarterly Goals to In Progress on the Data-Platform-SRE board.

It's also worth bearing in mind this ticket: T309622: Create Airflow Pipeline for Ingesting/Updating Superset data into DataHub which is about how we intend to ingest metadata from Superset to DataHub.

Our current superset instances currently use a reverse-proxy configuration with CAS/SSO enabled:
See: https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/manifests/superset/proxy.pp

When attempting to ingest metadata from Superset to DataHub (about charts, dashboards, lineage etc) we found that this SSO configuration made that ingestion impossible (or at least, very difficult).
In order to work around that problem, we launched a temporary instance of superset in a conda environment on a stat box, gave it the superset_config.py from production, then ingested from that instance.
That worked, but we have not yet taken it forward to automate the setup.

We should bear in mind:

  1. How will the CAS/SSO system work under dse-k8s?
  2. How should we best configure this automated Superset -> DataHub ingestion pipeline?

One option for getting the CAS/SSO integration working, would be to use OAuth2 integration with CAS/SSO, which would do away with the need to use a reverse proxy.
https://superset.apache.org/docs/installation/configuring-superset/#custom-oauth2-configuration

I'm checking with Infrastructure-Foundations whether this is likely to work.
Another possible option is to use OIDC, which has been shown to work, but might need a little more work.

If those aren't feasible, we can still use a reverse proxy configuration in k8s, but it would be nice to be able to do without it.

I've made a start on this:

https://docs.google.com/document/d/1PT9cRVFtN23GlWfYo-_bTUzVcK12-dSSJcX-SV4rtqs/edit

It's taking longer than I expected to write down all the details of our current superset deployments, but I think it will be useful.

I feel that I have now finished this Superset on Kubernetes design document, so it is ready for sharing.

BTullis moved this task from Needs Review to Done on the Data-Platform-SRE board.
BTullis updated the task description. (Show Details)