Description
As of now permissions are not automatically set for the discovery.wikibase_rdf table when new snapshots are generated. For instance I just ran into the following error when trying to query this table:
PrestoUserError: PrestoUserError(type=USER_ERROR, name=PERMISSION_DENIED, message="Permission denied: user=andrewtavis-wmde, access=EXECUTE, inode="/wmf/data/discovery/wikidata/rdf/date=20230717"
I was able to query with a WHERE date=20230710 clause though. I now can fully query the table, but only after being given permissions explicitly.
Possible Solutions
@JAllemandou suggested the following on Slack:
- Add --conf spark.hadoop.fs.permissions.umask-mode=022 to the spark job generating the data (seems simpler)
- Explicitly add an airflow step to update perms after data generation