Page MenuHomePhabricator

Presto error: Failed to list directory
Closed, InvalidPublic

Description

This task is about an error I am receiving when attempting to run a query using Superset's sqllab.

Note: I thought #product-infrastructure-data and Product-Analytics might be good people to make aware of this issue, although if there are other/different people who should be notified, please edit the tags accordingly.

Behavior

  1. Visit: https://superset.wikimedia.org/superset/sqllab
  2. Set the Database field to: presto_analytics_hive
  3. Set the Schema field to event
  4. Paste in the following query
SELECT event.save_failure_type, event.save_failure_message, count(*) as count
from editattemptstep
where event.integration='discussiontools' and event.action='saveFailure'
group by event.save_failure_type, event.save_failure_message
order by count desc
  1. Click RUN

Actual

  1. ❗️Notice the following error in the Results tab: presto error: Failed to list directory: hdfs://analytics-hadoop/wmf/data/event/EditAttemptStep/year=2021/month=1/day=9/hour=8

Expected

  1. ✅Query executes; data is returned in the Results tab

Meta

  • Superset User Name: ppel
  • Role: [Alpha]
  • Email associated with account: ppelberg@wikimedia.org

Event Timeline

LGoto triaged this task as Medium priority.
LGoto moved this task from Triage to Needs Investigation on the Product-Analytics board.
LGoto subscribed.

@ppelberg yes, tagging Product Analytics was a good call, we can help direct this!

I think the issue is that you're trying to directly query a table that is accessible to people with analytics-privatedata-users access.

Looping in Analytics for review and confirmation that this is the issue

@elukey, do Superset+Presto users need analytics-privatedata-users access? If so, we should give different advice than we did in https://phabricator.wikimedia.org/T271602#6735021! :)

@ppelberg, Additionally, please always use a partition filter when running queries! Otherwise you'll end up querying the entire table's data all at once:

https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Use_partitions

I think Presto is allowing you to do this when it shouldn't. I'll file a ticket for that.

I think Presto is allowing you to do this when it shouldn't. I'll file a ticket for that.

FYI: T273004: Presto should warn or prevent users from querying without Hive partition predicates

Ah, yes ok.

@ppelberg, your access ticket is not yet finished! T271602: Hue access for Peter Pelberg I'll comment over there.

This will be fixed once @ppelberg access request is finished. Closing.