Page MenuHomePhabricator

Fix default ownership and permissions for Hive managed databases in /user/hive/warehouse
Closed, ResolvedPublic

Description

It looks like managed databases and tables are varying combinations of group ownership by hadoop or hdfs. Likely we want the default group ownership to be analytics-privatedata-users.

Event Timeline

fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Just chatted with @JAllemandou, here's what we think we should do.

Currently, /user/hive/warehouse has mode 1777 as recommended by Cloudera. This is not what we want. We want the default to be that new dirs and files are read+write by owners, read by analytics-privatedata-users, and not readable by others.

To fix this, we will:

Set hive warehouse group ownership to analytics-privatedata-users, and chmod to 0750. This will cause new databases to be created with proper permissions and ownership.

sudo -u hdfs hdfs dfs -chgrp analytics-privatedata-users /user/hive/warehouse
sudo -u hdfs hdfs dfs -chmod 0750 /user/hive/warehouse

Set the same thing for all database dirs. This will cause newly created tables to be created with the proper permissions and ownership.

sudo -u hdfs hdfs dfs -chgrp analytics-privatedata-users /user/hive/warehouse/*
sudo -u hdfs hdfs dfs -chmod 0750 /user/hive/warehouse/*

We're not sure what we should do with existing tables and data. The right thing to do would be to set the same perms for those as well. However, there are a lot of user databases and tables, and we might break something for those users.

Let's discuss this as a team next week.

To do this for all files:

sudo -u hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/
sudo -u hdfs hdfs dfs -chmod -R g-w,o-rwxt /user/hive/warehouse/

We will discuss this with Product Analytics as the next sync.

In todays PA sync we decided to move forward with this.

@Mayakp.wiki asked that her db and tables be owned by analytics-product. Done:

sudo -u hdfs hdfs dfs -chown -R analytics-product /user/hive/warehouse/mayakpwiki.db

Ah, I reverted ^. Maya had meant the tables in wmf_product should be analytics-product owned:

sudo -u hdfs hdfs dfs -chown -R analytics-product /user/hive/warehouse/wmf_product.db