Page MenuHomePhabricator

Users having issues with presto sqllab on superset
Closed, ResolvedPublic

Description

Users not part of analytics engineering are having issues using the sqllab with Presto in superset.

Event Timeline

Nuria created this task.Apr 10 2020, 3:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 10 2020, 3:14 PM
Nuria assigned this task to elukey.Apr 10 2020, 3:15 PM
Nuria added a project: Analytics-Kanban.
Nuria added a subscriber: dr0ptp4kt.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.Apr 10 2020, 4:49 PM
Nuria updated the task description. (Show Details)Apr 10 2020, 9:11 PM

There are multiple layers of security authentication in this scheme:

  • Users log in to superset via LDAP auth (HTTP basic auth) and the username is picked up from there
  • Superset itself (as daemon) is authenticated via Kerberos, and can act as proxy for other users.
  • Superset then calls Presto via https, proxying the username.
  • Presto receives the HTTPS call, and gets delegation tokens for the Hive Metastore and HDFS.

The above works fine for analyitcs-admins users, that are deployed more or less on all Analytics nodes, but it doesn't for the other ones.

Presto coordinator logs don't tell much, I still need to figure out a good combination of logging that makes sense, I can't really see a clear indicator if a user failed a query for some kerberos trouble. https://github.com/prestodb/presto/wiki/Security-Troubleshooting-Guide is useful, will try to apply it (so far no luck).

One thing that I noticed is that the hdfs-audit.log on an-master1001 shows analytics-admins users logs (like me and joseph) trying to access HDFS (corresponding to the tables visited from the sqllab's UI), but I don't see other users. This means that the error should lie between Presto and the Hive metastore, or similar.

I then checked hive metastore logs, and found stuff like the following for /var/log/hive/hive-metastore.log.2020-04-10:

javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: HIVE_DELEGATION_TOKEN owner=dr0ptp4kt, renewer=dr0ptp4kt, realUser=hive/an-coord1001.eqiad.wmnet@WIKIMEDIA, issueDate=1585853752530, maxDate=1586458552530, sequenceNumber=871, masterKeyId=1]

Seems a good lead but I don't see the same error for all users, most notably I don't see lexnasser mentioned in there (we did some testing after standup) so I am not sure if it is a red herring or not.

The next test that I'd like to do is asking people to visit https://superset.wikimedia.org/superset/dashboard/73/ and see if it leads to the same error. The dashboard in fact uses Presto behind the scenes, if it works (and we can see hdfs-audit logs for example) then it means that the error might lie in the SQL Lab. The downside of this is that I need somebody with a user that triggers the problem to repro :(

elukey added a subscriber: lexnasser.EditedApr 11 2020, 9:47 AM

@lexnasser when you have a moment could you please visit https://superset.wikimedia.org/superset/dashboard/73/ and report back if it leads to and error or if you can visualize the dashboard correctly? (graphs takes some seconds to appear).

Just to triple check, please also check that the sqllab stuff yields the same error, today I rolled restart some daemons and I'd like to make sure that no change happened :)

https://superset.wikimedia.org/superset/sqllab -> Datasource "analytics_presto_hive" -> schema "event" -> table any one that you want

elukey added a comment.EditedApr 11 2020, 6:39 PM

Thanks a lot! I can see from the hdfs-audit.log that you can indeed read data from HDFS, so I am now leaning towards an issue in Superset with the SQLlab.

The "Unknown error" doesn't show up in the logs, I am not wondering if it is something related to JS? Another possibility is that the analytics team's members are admins in Superset, and some perm/privilege allows them to use SQLlab with Presto.

@lexnasser - I just made you Admin in Superset, when you have time and patience, would you mind to check if anything changes? Also, Can you check if you see anything weird in the JS Console of the dev tools?

@elukey I don't see the "Unknown error" message anymore. Nothing in the JS console either.

@elukey I don't see the "Unknown error" message anymore. Nothing in the JS console either.

Thanks a ton! So this is indeed an issue with Superset itself, the perms assigned to the Admins (i.e. Analytics team) have something that allows them to query data from Presto, and Alpha (where all users are) doesn't. Will investigate :)

Ok I think I got to the bottom of this. I created a test user in superset, and created a ssh tunnel to bypass httpd (hence LDAP auth) manually setting the right header to appear as logged. I was able to reproduce the issue ("Unknown error").

It seems that users in Alpha/Gamma roles (we use Alpha by default) cannot see the SQLLab panel/icon in the main page, and don't have permits to use it. There is a specific extra role to add, "sqllab", that allows a user to use it.

See https://superset.apache.org/security.html#sql-lab

@dr0ptp4kt I added the extra sqllab role to your Superset user, can you check if the "Unknown error" is fixed whenever you have time?

https://superset.wikimedia.org/superset/sqllab -> datasource "presto_analytics_hive" -> schema "event" -> table "the-el-table-that-you-want"

Thanks!

@elukey that worked, yes - thanks.

Milimetric triaged this task as High priority.Apr 13 2020, 3:57 PM
LGoto moved this task from Triage to Tracking on the Product-Analytics board.Apr 13 2020, 4:34 PM

The next step is to figure out how to handle the extra permissions needed for all users. Options are:

  • add sqllab's perms to the Alpha role in Superset, so everybody would have all the permissions right away (past + new users)
  • add to existing users the sqllab role (should be easy with a db alter table) and then figure out how to add the role upon user's first login (the Alpha role is added by default)
elukey renamed this task from Users having issues with presto dashboards on superset to Users having issues with presto sqllab on superset .Apr 14 2020, 12:42 PM
elukey updated the task description. (Show Details)
elukey added subscribers: elukey, Milimetric.
Nuria added a comment.Apr 14 2020, 5:14 PM

add to existing users the sqllab role (should be easy with a db alter table) and then figure out how to add the role upon user's first login (the Alpha role is added by default)

per our post standup conversation let's go with this one

List of permissions for the sqllab role in Superset:

can search queries on Superset,
can sql json on Superset,
can csv on Superset, 
menu access on SQL Lab, 
can sqllab on Superset, 
can sqllab viz on Superset, 
menu access on SQL Editor, 
menu access on Query Search, 
menu access on Saved Queries, 
can list on UserDBModelView, 
can list on UserRemoteUserModelView

The ones that are not in Alpha are:

can sql json on Superset
menu access on SQL Lab
can list on UserDBModelView
can list on UserRemoteUserModelView

In this case the problem seems to be reported by the JS console: Failed to load resource: the server responded with a status of 401 (UNAUTHORIZED)

The missing permission is can sql json on Superset, and also menu access on SQL Lab. I would just add these two for the moment, and let people to test them.

Mentioned in SAL (#wikimedia-analytics) [2020-04-15T14:03:37Z] <elukey> update Superset Alpha role perms with what stated in T249923#6058862

@dr0ptp4kt I modified your permissions again, in theory now everybody should be able to use Presto without issues. Can you re-test and let me know if all is ok?

@Aklapper same thing for you :)

Yep, that works.

elukey moved this task from In Progress to Done on the Analytics-Kanban board.
elukey set Final Story Points to 5.
elukey added a subscriber: srishakatux.
Nuria closed this task as Resolved.May 4 2020, 6:55 PM