It would be very useful to have SQL Templating enabled in Superset, as it would allow for faster chart generation in Presto-based dashboards and allow us to share personalized links to dashboards set to specific values of filters.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
superset: Turn template processing back on | operations/puppet | production | +4 -0 |
Related Objects
Event Timeline
Suppose we have a dashboard with a chart powered by Presto. For simplicity, imagine transformations have been taken care of. The underlying query is then:
SELECT * FROM tbl
Because Presto needs to query all of the data and then any subsetting (filtering, specific date range, etc.) has to happen after the fact on the retrieved results. This can lead to timeouts and slow performance in general.
From https://preset.io/blog/intro-jinja-templating-apache-superset/:
For example, to add a time range to a virtual dataset, you can write the following:
SELECT * FROM tbl WHERE dttm_col > '{{ from_dttm }}' AND dttm_col < '{{ to_dttm }}'
If I understand correctly, the subsetting happens at query time rather than post hoc on the entirety of the result set. That should speed things up!
Change 811766 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):
[operations/puppet@production] superset: Turn template processing back on
I would also love to see this come back, I have old dashboards that no longer work because templating was turned off. Looking things over, that was turned off for the 1.0 release here (without justification): https://github.com/apache/superset/pull/11172
Patch turns it back on. I would like to have seen what supersets justification was for turning it off, maybe there is something dangerous here? I suppose jinja is allowing users to achieve a minimal form of remote code execution on the superset instance, but in our systems that is plausibly acceptable. Jinja appears to make it difficult to run arbitrary python code, as far as i can tell any functions to be called have to be specially registered.
Thank you @EBernhardson!! And good find!
Also, https://apache.github.io/superset/sqllab.html has this as an example:
SELECT * FROM some_table WHERE partition_key = '{{ presto.latest_partition('some_table') }}'
and I don't know if presto.latest_partition() is something that would be available to us or if we'd have to implement it ourselves but either way enabling templating would allow us to have a really useful function (especially when working with mediawiki history!)
I suspect the presto.latest_partition function will come for free, here is a query that i'm hoping will start working again with templating enabled. Here the table name is parameterized (probably not necessary, but DRY's things up a little) and it sources filter variables from another chart in the dashboard. This used to work pre-superset 1.0.
SELECT norm_query, num_sessions, queries FROM {{ queries_table }} WHERE date='{{ presto.first_latest_partition(queries_table) }}' AND wiki IN ('{{ "','".join(filter_values('wiki', 'mediawikiwiki')) }}') AND cardinality(filter(queries, q -> q.num_sessions > 2)) > 1 AND norm_query <> '' ORDER BY num_sessions DESC LIMIT {{ filter_values('limit', 500)[0] }}
I'm happy with this and since you've already done the work in writing the patch, I'll merge and test it today.
Change 811766 merged by Btullis:
[operations/puppet@production] superset: Turn template processing back on
OK, @EBernhardson, @mpopov - The patch has been applied and superset has been restarted with the feature flag enabled. It looks OK to me, but could you test the functionality please when it's convenient? Thanks.
@BTullis thanks! Loading my old dashboard it looks to be working same as it did before. Looks complete to me.