Page MenuHomePhabricator

Superset presto error: Failed to list directory: hdfs://analytics-hadoop/wmf/data/event/...
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:
in SQL lab: explore a table created by another user - see example below:

Screen Shot 2021-01-22 at 11.24.15 AM.png (422×1 px, 61 KB)

in Dashboards: load a dashboard built using presto queries
For example, I loaded https://superset.wikimedia.org/superset/dashboard/androidsubmissionwarnings/ and received the following error:

Screen Shot 2021-01-21 at 4.16.16 PM.png (224×1 px, 41 KB)

Yesterday, I loaded https://superset.wikimedia.org/superset/dashboard/AndroidFontThemeChange/ and received a similar error. I asked Shay (who built the dashboard) to load it, and it worked for her. Immediately after Shay loaded the dashboard, I refreshed it and was able to successfully view it:

Screen Shot 2021-01-21 at 4.16.37 PM.png (565×1 px, 76 KB)

I suspect there's a permissions issue

Actual Results:
Error message

Expected Results:
View of query results

Event Timeline

Hi @kzimmerman!

In the case of session_length:

It was a problem of permissions, indeed. Thanks for spotting it. It's fixed now. Please, check that you can access now both mforns.session_length and mforns.session_length_2. When you confirm, I'll close this task.

Regarding other data sets:

All data sets that we want to make visible to other analytics-related employees (i.e. via Superset) should belong to the group analytics-privatedata-users.
And also, the file permission mode should be 750, so that no-one outside analytics-privatedata-users can read the data.

hdfs dfs -chgrp -R analytics-privatedata-users /path/to/dataset
hdfs dfs -chmod -R 750 /path/to/dataset

Please, could you pass this requirements to data set creators?

Thanks a lot!

@mforns I'm still getting the same error when I try to load mforns.session_length and mforns.session_length_2, unfortunately :(

I'll make sure my team is aware of those requirements for data set creators! Do you know if this is documented for reference?

@kzimmerman I've checked and I could not find your username (kzeta right?) in the analytics-privatedata-users group.
That's probably why you can not access the session length data.
We should add you there. Created a task: T272982

@kzimmerman https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups

We are trying to simplify the procedure, but essentially for Superset a user needs to:

  • Login via Wikidev account (user/pass), with the username in the wmf or nda LDAP groups. This grants access to all the dashboards based on Druid since they are not authenticated.
  • Membership of the analytics-privatedata-users group (even without ssh access, this is new) to use Presto, since the user in this case is proxied and authenticated up to Hive/HDFS.

Comments are welcome, let me know if there are parts of the docs that can be improved!

Edit: I misread the comments and I saw that you were referring to the earlier comment about data ownership etc.., apologies, but any feedback on the above doc is good as well :)