Page MenuHomePhabricator

Enable Ceph S3 locations for Hive Metastore tables
Open, Needs TriagePublic

Description

Hive Metastore should be configured with credentials and endpoint settings for our Ceph RGW S3 service so that tables with s3a:// locations can be registered through the standard Hive catalog. Currently, CREATE EXTERNAL TABLE ... LOCATION 's3a://...' fails because the metastore's own JVM tries to validate the location via FileSystem.get() and has no AWS credentials provider, S3A endpoint, or SSL config available — even though client-side Spark sessions are correctly configured and can read/write the bucket directly. Concretely, this would mean adding the relevant fs.s3a.* properties (credentials provider, access/secret key, endpoint, path.style.access, connection.ssl.enabled) to the metastore's hive-site.xml or core-site.xml, and ensuring hadoop-aws is on its classpath.

The use case is enabling Iceberg and plain external tables backed by Ceph S3 to be registered in the shared Hive catalog so they're discoverable alongside HDFS-backed tables, rather than requiring a separate Iceberg Hadoop catalog per user or project. An example notebook illustrating the working client-side Spark setup and the resulting metastore error.

Event Timeline

Gehel subscribed.

We need to have a security model before moving too far along. This is very much part of our general strategy, we just need to make sure we don't paint ourselves in a corner.

Blocked until we have a security model.

Thanks for the update, can we link to the phab for the security model?