Users not part of analytics engineering are having issues using the sqllab with Presto in superset.
There are multiple layers of security authentication in this scheme:
- Users log in to superset via LDAP auth (HTTP basic auth) and the username is picked up from there
- Superset itself (as daemon) is authenticated via Kerberos, and can act as proxy for other users.
- Superset then calls Presto via https, proxying the username.
- Presto receives the HTTPS call, and gets delegation tokens for the Hive Metastore and HDFS.
The above works fine for analyitcs-admins users, that are deployed more or less on all Analytics nodes, but it doesn't for the other ones.
Presto coordinator logs don't tell much, I still need to figure out a good combination of logging that makes sense, I can't really see a clear indicator if a user failed a query for some kerberos trouble. https://github.com/prestodb/presto/wiki/Security-Troubleshooting-Guide is useful, will try to apply it (so far no luck).
One thing that I noticed is that the hdfs-audit.log on an-master1001 shows analytics-admins users logs (like me and joseph) trying to access HDFS (corresponding to the tables visited from the sqllab's UI), but I don't see other users. This means that the error should lie between Presto and the Hive metastore, or similar.
I then checked hive metastore logs, and found stuff like the following for /var/log/hive/hive-metastore.log.2020-04-10:
javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: HIVE_DELEGATION_TOKEN owner=dr0ptp4kt, renewer=dr0ptp4kt, realUser=hive/an-coord1001.eqiad.wmnet@WIKIMEDIA, issueDate=1585853752530, maxDate=1586458552530, sequenceNumber=871, masterKeyId=1]
Seems a good lead but I don't see the same error for all users, most notably I don't see lexnasser mentioned in there (we did some testing after standup) so I am not sure if it is a red herring or not.
The next test that I'd like to do is asking people to visit https://superset.wikimedia.org/superset/dashboard/73/ and see if it leads to the same error. The dashboard in fact uses Presto behind the scenes, if it works (and we can see hdfs-audit logs for example) then it means that the error might lie in the SQL Lab. The downside of this is that I need somebody with a user that triggers the problem to repro :(
@lexnasser when you have a moment could you please visit https://superset.wikimedia.org/superset/dashboard/73/ and report back if it leads to and error or if you can visualize the dashboard correctly? (graphs takes some seconds to appear).
Just to triple check, please also check that the sqllab stuff yields the same error, today I rolled restart some daemons and I'd like to make sure that no change happened :)
https://superset.wikimedia.org/superset/sqllab -> Datasource "analytics_presto_hive" -> schema "event" -> table any one that you want
Thanks a lot! I can see from the hdfs-audit.log that you can indeed read data from HDFS, so I am now leaning towards an issue in Superset with the SQLlab.
The "Unknown error" doesn't show up in the logs, I am not wondering if it is something related to JS? Another possibility is that the analytics team's members are admins in Superset, and some perm/privilege allows them to use SQLlab with Presto.
@lexnasser - I just made you Admin in Superset, when you have time and patience, would you mind to check if anything changes? Also, Can you check if you see anything weird in the JS Console of the dev tools?
Thanks a ton! So this is indeed an issue with Superset itself, the perms assigned to the Admins (i.e. Analytics team) have something that allows them to query data from Presto, and Alpha (where all users are) doesn't. Will investigate :)
Ok I think I got to the bottom of this. I created a test user in superset, and created a ssh tunnel to bypass httpd (hence LDAP auth) manually setting the right header to appear as logged. I was able to reproduce the issue ("Unknown error").
It seems that users in Alpha/Gamma roles (we use Alpha by default) cannot see the SQLLab panel/icon in the main page, and don't have permits to use it. There is a specific extra role to add, "sqllab", that allows a user to use it.
The next step is to figure out how to handle the extra permissions needed for all users. Options are:
- add sqllab's perms to the Alpha role in Superset, so everybody would have all the permissions right away (past + new users)
- add to existing users the sqllab role (should be easy with a db alter table) and then figure out how to add the role upon user's first login (the Alpha role is added by default)
add to existing users the sqllab role (should be easy with a db alter table) and then figure out how to add the role upon user's first login (the Alpha role is added by default)
per our post standup conversation let's go with this one
List of permissions for the sqllab role in Superset:
can search queries on Superset, can sql json on Superset, can csv on Superset, menu access on SQL Lab, can sqllab on Superset, can sqllab viz on Superset, menu access on SQL Editor, menu access on Query Search, menu access on Saved Queries, can list on UserDBModelView, can list on UserRemoteUserModelView
The ones that are not in Alpha are:
can sql json on Superset menu access on SQL Lab can list on UserDBModelView can list on UserRemoteUserModelView
In this case the problem seems to be reported by the JS console: Failed to load resource: the server responded with a status of 401 (UNAUTHORIZED)
The missing permission is can sql json on Superset, and also menu access on SQL Lab. I would just add these two for the moment, and let people to test them.