Page MenuHomePhabricator

Migrate the Analytics Superset instances to our DSE Kubernetes cluster
Closed, ResolvedPublic

Description

This is a proposal to migrate our Superset services to the DSE kubernetes cluster.

Two instances of Superset are within scope:

Currently:

  • superset is hosted on bare-metal on a single host: an-tool1010.eqiad.wmnet
  • superset-next is hosted on a VM on on a single host: an-tool1005.eqiad.wmnet

They each use a discrete database on analytics_meta MariaDB database on an-coord1001 for storing state.

They use have an instance of memcached local to the host, which is used for various metadata caching, but not query results caching.

The purpose of this ticket is to try to achieve consensus on the benefits, costs, and potential risks of moving Superset to the DSE Kubernetes cluster.

At this stage, we believe that the following steps will be required:

  • Write a lightweight design document describing how the Superset services are intened to work on Kubernetes T349396
  • Create a Superset container image using GitLab-CI and the Blubber/Kokkuri framework. T352165
  • Apply our patches not yet merged upstream to the supserset codebase in our Docker image - T356477
  • Create a helm chart for Superset T352166
  • Create two namespaces for superset and superset-next: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/983718
  • Add kubeadm config files for the two new namespaces: https://gerrit.wikimedia.org/r/c/operations/puppet/+/983720
  • Create two helmfile deployment for superset and superset-next T353790
  • Ensure that we are running an up-to-date version of Superset, facilitating the migration T335356
  • Ensure necessary firewall rules are open between the DSE worker nodes and external services - T356623
  • Create a keytab for each Superset deployment and make this available to the pods
  • Make configuration secrets available to helmfile - T356480
  • Configure ingress internal DNS records - T356481
  • Add entries to the puppet service catalog - T356483
  • Update public domain DNS records to make them point to the DSE Kubernetes ingress - T356482
  • Configure OIDC authentication for superset on dse-k8s - T353794
  • Write a migration plan for Superset to K8S - including what to do about the legacy instances.
  • Monitor the availability of the superset deployments - T356484
  • Create saved views for the superset deployment logs - T356485
  • Update the wikitech page with our production readiness checklist - T356486
  • Find a solution for the requestctl-generator html page - T356490
  • Serve Superset static assets from an optimised container - T357890

n.b. At present, we are not planning to move the metadata database (which is MariaDB in our case) to Kubernetetes.
The upstream helm charts declare a dependence on postgresql, which is what they tend to use with persistent volume claims, but for now we are not planning to use this.

We do have an option to migrate to PostgreSQL running on an-db100[1-2] (which is what Airflow uses) but we are not necessarily planning to take this option either. We have decided to stick with MariaDB for now.

Related Objects

StatusSubtypeAssignedTask
Resolvedbrouberol
Resolvedbrouberol
ResolvedBTullis
ResolvedBTullis
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
ResolvedStevemunene
ResolvedStevemunene
ResolvedStevemunene
Resolvedbrouberol
Resolvedbrouberol
ResolvedBTullis
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
Resolvedbrouberol
ResolvedBTullis
Resolvedbrouberol
Declinedbrouberol
Resolvedbrouberol
DeclinedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 983718 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add superset namespaces to the dse-k8s cluster

https://gerrit.wikimedia.org/r/983718

Evaluate the upstream chart and if appropriate use our policy review to decide whether or not we should use it.

I had a good look at this chart, but I don't believe that there is enough in it of value to make it worth our while using it.

There are some interesting points if we were to wish to use celery for asynchronous queries and websockets for background chart updates, but these features are not part of our current roadmap. Therefore, I think that we will be better served by using our own in-house deploment chart templates that are managed with sextant.

Change 983720 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add kubeadm files for superset namespaces

https://gerrit.wikimedia.org/r/983720

Change 983718 merged by jenkins-bot:

[operations/deployment-charts@master] Add superset namespaces to the dse-k8s cluster

https://gerrit.wikimedia.org/r/983718

Mentioned in SAL (#wikimedia-analytics) [2023-12-21T15:36:11Z] <btullis> creating superset and superset-next namespace on dse-k8s for T347710

Change 983720 merged by Btullis:

[operations/puppet@production] Add kubeadm files for superset namespaces

https://gerrit.wikimedia.org/r/983720

brouberol closed subtask Restricted Task as Resolved.Feb 1 2024, 10:57 AM

Change 995187 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Upgrade the platform_eng instance of airflow to puppet 7

https://gerrit.wikimedia.org/r/995187

Change 995187 merged by Muehlenhoff:

[operations/puppet@production] Upgrade the platform_eng instance of airflow to puppet 7

https://gerrit.wikimedia.org/r/995187

Mentioned in SAL (#wikimedia-analytics) [2024-02-09T13:47:03Z] <brouberol> deploying superset/superset-next services in dse-k8s-eqiad - T347710

Mentioned in SAL (#wikimedia-analytics) [2024-02-09T14:01:12Z] <brouberol> superset was successfully deployed once the MySQL password was updated - T347710

brouberol updated the task description. (Show Details)

Added the new kerberos principals.

root@krb1001:~# kadmin.local addprinc -randkey superset/superset-next.svc.eqiad.wmnet@WIKIMEDIA
root@krb1001:~# kadmin.local addprinc -randkey superset/superset.svc.eqiad.wmnet@WIKIMEDIA

Created keytab files:

root@krb1001:/srv/kerberos/keytabs# mkdir -p superset-next.svc.eqiad.wmnet/superset
root@krb1001:/srv/kerberos/keytabs# mkdir -p superset.svc.eqiad.wmnet/superset

root@krb1001:/srv/kerberos/keytabs# kadmin.local ktadd -norandkey -k /srv/kerberos/keytabs/superset-next.svc.eqiad.wmnet/superset/superset.keytab superset/superset-next.svc.eqiad.wmnet@WIKIMEDIA
Entry for principal superset/superset-next.svc.eqiad.wmnet@WIKIMEDIA with kvno 1, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/srv/kerberos/keytabs/superset-next.svc.eqiad.wmnet/superset/superset.keytab.

root@krb1001:/srv/kerberos/keytabs# kadmin.local ktadd -norandkey -k /srv/kerberos/keytabs/superset.svc.eqiad.wmnet/superset/superset.keytab superset/superset.svc.eqiad.wmnet@WIKIMEDIA
Entry for principal superset/superset.svc.eqiad.wmnet@WIKIMEDIA with kvno 1, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/srv/kerberos/keytabs/superset.svc.eqiad.wmnet/superset/superset.keytab.

We rolled this out to superset-next and then we updated the database settings in the UI as shown.

image.png (859×546 px, 68 KB)

https://superset.wikimedia.org is now served by a service running in dse-k8s-eqiad. We only have a couple of cleanup tasks to perform, to remove resources from Puppet. As far as users are concerned, this is done!

https://superset.wikimedia.org is now served by a service running in dse-k8s-eqiad. We only have a couple of cleanup tasks to perform, to remove resources from Puppet. As far as users are concerned, this is done!

I agree. This epic is almost ready to be closed. Great work!