Page MenuHomePhabricator

Support CSV uploads in Superset
Closed, ResolvedPublic

Description

Superset has a CSV upload capability, which could be very useful for building dashboards from static external datasets. We have an analytics MySQL 'staging' db which is used for custom user tables. We should allow superset users to upload smallish CSVs to the MySQL staging db, resulting in new tables created there.

Example

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 19 2020, 10:03 PM

Change 573393 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set Superset UPLOAD_FOLDER to /tmp/superset_uploads/

https://gerrit.wikimedia.org/r/573393

We just need https://gerrit.wikimedia.org/r/c/operations/puppet/+/573393. That + enabling this for the mysql_staging database in superset (which I already did) works!

Is there a reason not to do this?

Nuria added a comment.Feb 20 2020, 2:02 AM

Sounds good, +1

Past me filed a similar code change in the past: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/479408/

Moritz suggested to modify Superset's systemd unit to add PrivateTmp=true. I am +1 as long as the staging db is the one on dbstore (not db1108/eventlogging etc..).

Change 573393 merged by Ottomata:
[operations/puppet@production] Set Superset UPLOAD_FOLDER to /tmp/superset_uploads/

https://gerrit.wikimedia.org/r/573393

Ottomata moved this task from Next Up to Done on the Analytics-Kanban board.Feb 20 2020, 2:37 PM

Tested that this works well, pretty easy to create cc @cchen so she knows this is a possible option, tables have to be created on the mysql_staging database cc @kzimmerman cause uploading cvs can aide with use cases like the ones we have seen for dashboards that dana's group was making from multiple data sources.

Nuria closed this task as Resolved.Feb 28 2020, 12:40 AM
Nuria set Final Story Points to 2.
cchen added a comment.Feb 28 2020, 6:12 PM

Thank you @Ottomata and @Nuria! i will try out and share with other teams.

Nuria added a subscriber: EYener.Feb 28 2020, 6:39 PM

pinging also @EYener in FR so she knows this is an easy way to prototype dashboards from ad hoc data sources, just a csv file is needed

Hi @Nuria and all, we're ready to try a 'mock' data set as well. Can someone point me toward instructions on accessing and utilizing the staging environment so that I can get started with the upload? Thank you!

@EYener CVS uploads are enabled on http://superset.wikimedia.org so no special access needed

Thank you, @Nuria ! It works seamlessly.