The Analytics team is working on tightening the Hadoop security standards of the Analytics cluster. As part of this effort, there are these tasks opened:
- Configure Yarn to use proper Linux containers when executing JVMs across worker nodes. This means that each container will run as the user that started it, not as 'yarn' as it happens now.
- Reduce access to the Hadoop master nodes to admin only. We are currently mapping POSIX groups deployed to the Hadoop master nodes to HDFS in order to have a nice and easy way to add users/groups to HDFS via puppet.
Both tasks are currently difficult to achieve since the admin module doesn't contemplate the idea of a user deployed without SSH access (if keys are configured). A possible solution is
to add a new parameter to admin's init.pp to configure groups of users that should not have their SSH keys configured.
Use cases to keep in consideration:
- if a user belongs to a group added to both admin::groups and admin::groups_no_ssh, then the former should have higher priority (namely the user should get SSH access).
- if a user belongs to a group added to admin::groups_no_ssh then its ssh keys needs to be absented from the host(s) if the above point does not hold.
- if a user belongs to a group added to admin::groups_no_ssh only it should not get any SSH key deployed, the opposite for admin::groups.