Page MenuHomePhabricator

Productionize Superset
Closed, ResolvedPublic13 Estimate Story Points

Description

Provide top domain and data to truly test superset

It should be accessible w/o a ssh tunnel, we will keep pivot and superset running side by side

Details

Related Gerrit Patches:

Event Timeline

Nuria created this task.May 31 2017, 3:18 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 31 2017, 3:18 PM
Nuria moved this task from Incoming to Operational Excellence Future on the Analytics board.
mforns added a subscriber: mforns.

Discovered that the legend bug was actually not fixed (see T166320), so...
Do we still want to do this?
The only advantage is the autosource schema feature that displays the zero field in pageview_hourly.

Nuria added a comment.May 31 2017, 6:26 PM

I would say no then. We can scrape this work and focus on setting up superset for PMs to take a look.

Nuria renamed this task from Update pivot with swiv clone to Provide top domain and data to truly test superset .Jun 1 2017, 3:41 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.
Nuria updated the task description. (Show Details)
Nuria set the point value for this task to 8.
Nuria edited projects, added Analytics; removed Analytics-Kanban.Jun 5 2017, 4:14 PM
Nuria added a comment.Jun 8 2017, 3:55 PM

Superset is python and thus harder to deploy than a node counterpart.

Can we deploy this virtual env (whole environment is uploaded to gerrit)

Superset can consume from druid as it is.

Nuria triaged this task as Medium priority.Jun 8 2017, 3:58 PM
Nuria moved this task from Operational Excellence Future to Dashiki on the Analytics board.

See : https://phabricator.wikimedia.org/T166414 maybe we can use navigationTiming data also to test UI?

ping @Ottomata and @JAllemandou please update ticket if superset is running on screen accessible by tunnel

Now running in a screen on stat1004 under my user:

ssh -N stat1004.eqiad.wmnet -L 8088:stat1004.eqiad.wmnet:8088

http://localhost:8080

Login is admin / admin

I updated the datasources to contain interesting metrics and created a new user able to use and create visualisation, but not mess with the config:

tester / tester

Change 392933 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/superset/deploy@master] Deploy repo for superset (python)

https://gerrit.wikimedia.org/r/392933

Change 392978 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Puppetization for superset

https://gerrit.wikimedia.org/r/392978

elukey added a subscriber: elukey.Nov 30 2017, 5:15 PM
Ottomata renamed this task from Provide top domain and data to truly test superset to Productionize Superset .Nov 30 2017, 5:15 PM
Ottomata edited projects, added Analytics-Kanban; removed Analytics.

Change 395804 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/superset/deploy@master] Adding artifacts for jessie superset==0.20.6

https://gerrit.wikimedia.org/r/395804

Change 392933 merged by Ottomata:
[analytics/superset/deploy@master] Deploy repo for superset (python)

https://gerrit.wikimedia.org/r/392933

Change 395804 merged by Ottomata:
[analytics/superset/deploy@master] Adding artifacts for jessie superset==0.20.6

https://gerrit.wikimedia.org/r/395804

Change 395887 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/superset/deploy@master] Improvements to deployment scripts

https://gerrit.wikimedia.org/r/395887

Change 392978 merged by Ottomata:
[operations/puppet@production] Puppetization for superset

https://gerrit.wikimedia.org/r/392978

Change 395887 merged by Ottomata:
[analytics/superset/deploy@master] Improvements to deployment scripts

https://gerrit.wikimedia.org/r/395887

Change 396103 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Configure LDAP proxy and authentication for Superset

https://gerrit.wikimedia.org/r/396103

Change 396103 merged by Ottomata:
[operations/puppet@production] Configure LDAP proxy and authentication for Superset

https://gerrit.wikimedia.org/r/396103

Change 396124 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/dns@master] Add superset.wikimedia.org DYNA/CNAME

https://gerrit.wikimedia.org/r/396124

Change 396124 merged by Ottomata:
[operations/dns@master] Add superset.wikimedia.org DYNA/CNAME

https://gerrit.wikimedia.org/r/396124

Change 396127 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add misc cache route for superset.wikimedia.org -> thorium

https://gerrit.wikimedia.org/r/396127

Change 396127 merged by Ottomata:
[operations/puppet@production] Add misc cache route for superset.wikimedia.org -> thorium

https://gerrit.wikimedia.org/r/396127

Change 396141 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow authenticated (in certain ldap groups) to auto-register for superset account

https://gerrit.wikimedia.org/r/396141

Change 396141 merged by Ottomata:
[operations/puppet@production] Allow authenticated (in certain ldap groups) to auto-register for superset account

https://gerrit.wikimedia.org/r/396141

Change 396143 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set superset auth_settings => undef if not using ldap_proxy

https://gerrit.wikimedia.org/r/396143

Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.Dec 8 2017, 3:19 PM

Alright! https://superset.wikimedia.org is up and running.

I had a bit of trouble with authentication. Here's the skinny:

superset uses Flask-AppBuilder as its web framework. Flask-AppBuilder supports several authentication methods, including AUTH_LDAP and AUTH_REMOTE_USER.

I first tried AUTH_LDAP. However, it does not support restricting the authentication to users in certain LDAP groups. I think fixing Flask-AppBuilder to support this would be pretty easy. We'd just need to alter this code to support more complex filters than just uid=value.

But, since we already use an HTTP LDAP auth proxy for other services (pivot, yarn, etc.), I next tried AUTH_REMOTE_USER. This works! But has the downside of not supporting the AUTH_USER_REGISTRATION feature. That feature should allow superset to auto-create accounts for users that are authenticated, but don't yet have superset accounts in the superset database. AUTH_USER_REGISTRATION works for AUTH_LDAP, but not AUTH_REMOTE_USER.

I tested in labs, and was able to modify Flask-AppBuilder to support AUTH_USER_REGISTRATION with AUTH_REMOTE_USER. I submitted a https://github.com/dpgaspar/Flask-AppBuilder/pull/663 upstream. We'll see how that goes.

AUTH_REMOTE_USER + AUTH_USER_REGISTRATION has the downside that if someone can reach the superset app HTTP port directly, they can just set the X-Remote-User header and superset will consider them authenticated. For us, this should practically be ok, as the superset app HTTP port 9080 is only reachable by localhost (on thorium in production). The only other way to get to the app is via the HTTP proxy on port 80, which will force you to authenticate with LDAP, and only allow you to do so if you are in the wmf or nda LDAP groups.

For now I'd like to keep AUTH_REMOTE_USER and manually create user accounts in superset when we have to. If my patch gets merged, we can update the Flask-AppBuilder dependency in the analytics/superset/deploy repo.

Alternatively, I could submit a patch to support more complex filters, e.g. group membership, for AUTH_LDAP.

Change 396413 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Open druid coordinator to more networks; superset needs to query it

https://gerrit.wikimedia.org/r/396413

Change 396413 merged by Ottomata:
[operations/puppet@production] Open druid coordinator to more networks; superset needs to query it

https://gerrit.wikimedia.org/r/396413

Ottomata claimed this task.Dec 8 2017, 4:01 PM

Oof, Somethign is not happy with MySQL + superset and druid metadata refresh.

Dec  8 20:10:01 thorium superset[5227]: 2017-12-08 20:10:01,479:ERROR:root:(_mysql_exceptions.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') [SQL: u'INSERT INTO datasources (created_on, changed_on, description, default_endpoint, is_featured, filter_select_enabled, offset, cache_timeout, params, perm, datasource_name, is_hidden, fetch_values_from, cluster_name, user_id, changed_by_fk, created_by_fk) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'] [parameters: (datetime.datetime(2017, 12, 8, 20, 9, 10, 537145), datetime.datetime(2017, 12, 8, 20, 9, 10, 537180), None, None, 0, 0, 0, None, None, None, 'mediawiki_history_reduced', 0, None, 'public-eqiad', None, 2L, 2L)]

It seems this [[ https://github.com/apache/incubator-superset/blob/0.21.0/superset/connectors/druid/models.py#L164 | session.flush() call ]] is hanging due to an unclosed transaction (?) and causing the worker process to timeout.

Hm, might not be a MySQL related problem after all. I switched the database to a local sqlite db, and I get a very similar problem:

Dec  8 21:21:05 thorium superset[27367]: OperationalError: (sqlite3.OperationalError) database is locked [SQL: u'INSERT INTO datasources (created_on, changed_on, description, default_endpoint, is_featured, filter_select_enabled, "offset", cache_timeout, params, perm, datasource_name, is_hidden, fetch_values_from, cluster_name, user_id, changed_by_fk, created_by_fk) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: ('2017-12-08 21:21:00.153764', '2017-12-08 21:21:00.153786', None, None, 0, 0, 0, None, None, None, u'mediawiki_history_reduced', 0, None, u'public-eqiad', None, 2, 2)]

Change 396488 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use sync worker for superset

https://gerrit.wikimedia.org/r/396488

Change 396488 merged by Ottomata:
[operations/puppet@production] Use sync worker for superset

https://gerrit.wikimedia.org/r/396488

Hm, looks to be a threading/async/gevent issue. I switched the gunicorn worker class back to sync, and it works now. I bumped up to 8 workers. If 8 people run long queries at once, the app will hang. Might have to fix this and/or deploy celery works to help with long async queries. Let's save that for another task...

Weird, when I try to access the website I get:

This page isn’t working

superset.wikimedia.org redirected you too many times.
Try clearing your cookies.
ERR_TOO_MANY_REDIRECTS

Is there a known workaround for this?

This is what I am seeing with tcpdump on thorium:

GET / HTTP/1.1
Host: localhost:9080
[..]
X-Remote-User: Elukey
X-Forwarded-Host: superset.wikimedia.org
X-Forwarded-Server: superset.wikimedia.org
Connection: Keep-Alive

HTTP/1.1 302 FOUND
Server: gunicorn/19.7.1
Date: Sat, 09 Dec 2017 10:59:48 GMT
Connection: close
Content-Type: text/html; charset=utf-8
Content-Length: 241
Location: https://superset.wikimedia.org/superset/welcome

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="/superset/welcome">/superset/welcome</a>.  If not click the link.


GET /superset/welcome HTTP/1.1
Host: localhost:9080
[..]
X-Remote-User: Elukey
X-Forwarded-Host: superset.wikimedia.org
X-Forwarded-Server: superset.wikimedia.org
Connection: Keep-Alive

HTTP/1.1 302 FOUND
Server: gunicorn/19.7.1
Date: Sat, 09 Dec 2017 10:59:48 GMT
Connection: close
Content-Type: text/html; charset=utf-8
Content-Length: 221
Location: https://superset.wikimedia.org/login/
Set-Cookie: session=[REDACTED]; HttpOnly; Path=/

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="/login/">/login/</a>.  If not click the link.

GET /login/ HTTP/1.1
Host: localhost:9080
[..]
authorization: Basic [REDACTED]
upgrade-insecure-requests: 1
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,it;q=0.8
dnt: 1
cookie:session=[REDACTED]
[..]
X-Remote-User: Elukey
X-Forwarded-Host: superset.wikimedia.org
X-Forwarded-Server: superset.wikimedia.org
Connection: Keep-Alive

HTTP/1.1 302 FOUND
Server: gunicorn/19.7.1
Date: Sat, 09 Dec 2017 10:59:49 GMT
Connection: close
Content-Type: text/html; charset=utf-8
Content-Length: 209
Location: https://superset.wikimedia.org/
Set-Cookie: session=[REDACTED]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="/">/</a>.  If not click the link.

Then it restarts with a GET / HTTP/1.1 and it goes back to the beginning of this..

Reading from the Analytics IRC chan it seems that "its because superset is not auto-creating your superset account after auth". So if anybody encounters the issue this is the problem :)

Change 397616 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add superset password mappings for analytics mysql dbs

https://gerrit.wikimedia.org/r/397616

Change 397616 merged by Ottomata:
[operations/puppet@production] Add superset password mappings for analytics mysql dbs

https://gerrit.wikimedia.org/r/397616

Change 397629 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix superset lookup_password to work with sqla uris

https://gerrit.wikimedia.org/r/397629

Change 397629 merged by Ottomata:
[operations/puppet@production] Fix superset lookup_password to work with sqla uris

https://gerrit.wikimedia.org/r/397629

Ok, things are pretty decent over at superset.wikimedia.org. Still to do:

  • Update superset/flask-appbuilder once we get new releases from them. This will allow auto-account creation for LDAP users in the wmf and nda groups.
  • Figure out async vs sync worker database locking problem; possibly update to Python 3 to solve this.
  • Set up celery workers for async database queries. This way the web UI doesn't have to block to wait for queries to finish. Sync queries have a gui timeout anyway, I think of around 60 seconds. Perhaps this is good? Perhaps superset shouldn't be used for queries that take longer than 60 seconds?
  • Verify that Hive queries work. I think I might need to change the LDAP REMOTE_USER to shell login, rather than common name, in order for Hive doAs to work. Not sure about this yet.

Oh, btw, I added DB connections for analytics-slave / log db, and analytics-store, with passwords stored in puppet, not in superset meta db. You can now query those MySQL databases from superset. :)

@Ottomata : Super cool ! Many thanks :)

Made a new task for some of the above points: T182688

Ottomata moved this task from In Progress to Done on the Analytics-Kanban board.Dec 12 2017, 2:43 PM
Ottomata changed the point value for this task from 8 to 13.

Change 396143 merged by Ottomata:
[operations/puppet@production] Set superset auth_settings => undef if not using ldap_proxy

https://gerrit.wikimedia.org/r/396143

Nuria closed this task as Resolved.Dec 19 2017, 11:10 PM