Part of the documentation related to securing Hadoop is concerned to enable SSL/TLS encryption and authentication for TLS-based services, like:
- UI HTTP web interfaces
- Mapreduce/Spark shuffle services
- Namenode/Journalnode protocol for the HDFS edit log
The main idea is to deploy a Java keystore and a Java trustore on each node that will need to use a TLS certificate (more precisely: where a daemon related to a service will need to use TLS). High level plan:
- review all protocols that can benefit from TLS auth/encryption and establish the ones that need to be migrated to TLS. A good indicator for encryption is if sniffing traffic for a certain port leads to PII data.
- create TLS certificates via cergen in the puppet private repo, together with trustores and keystores
- add support in puppet to deploy trustores/keystores and the related ssl-client.xml and ssl-server.xml Hadoop configuration files.
- roll out the changes to the Hadoop testing cluster and then to the Analytics one.
Interesting links:
https://risdenk.github.io/2018/11/15/apache-hadoop-tls-ssl-notes.html