Page MenuHomePhabricator

Remove support for the (deprecated) Druid datasources (in favor of Druid Tables) on Superset
Open, MediumPublic


In Superset there are two ways of using Druid datasources in charts:

  1. Via Druid datasource - Menu' -> Sources -> Druid datasources etc..
  2. Via SQLAlchemy table definition

The former is currently not supported anymore, and bugs to the interface between Superset and Druid are not well supported. This task should:

  1. Review the last charts using Druid datasources, and move them to Druid tables (or simply follow up with their owners). This is something that the Product Analytics team has done in the past: T251857
  2. Remove DRUID_IS_ACTIVE = True in (puppet) to remove support for creating/using Druid datasources (will only leave the possibility to use tables).
  3. Refresh the documentation if needed.

Event Timeline

Today I made a little test in our staging instance, namely commenting DRUID_IS_ACTIVE = True in the file. This caused the following:

  1. Druid datasources disappeared from the main dropdown menu', leaving only the "Table" ones (so the ones defined via SQLAlchemy)
  2. The charts using the Druid datasources are still visible, but clicking on the datasource itself leads to a 404 (since a migration to the Druid tables is needed).

So in theory we could start this migration by simply turning off the Druid datasource support, to avoid new charts to be created with the old settings, and then ask people to migrate their chart slowly. I'll follow up with Product Analytics for some tests, but it looks promising!

Change 658228 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] superset: disable old druid datasouce panels

Change 658228 merged by Elukey:
[operations/puppet@production] superset: disable old druid datasouce panels

Mentioned in SAL (#wikimedia-analytics) [2021-01-25T10:18:48Z] <elukey> restart superset to remove druid datasources support - T263972

@odimitrijevic it is not, last time that I checked there was some usage of Druid datasources. We should do the following:

  • review what dashboards are still using Druid datasources
  • move to Druid tables

The main motivation is that upstream doesn't support anymore the old Druid datasource way, and upgrading is a pain in case we encounter a bug (a dashboard might break and upstream will not fix the bug etc..).