This task is about enabling TLS encryption and authentication for the MapReduce shufflers in the Hadoop Analytics cluster. We have already running it in the test cluster for months without any issues registered so far.
In a nutshell, during the shuffle step of a mapreduce job the reducer tasks pull data from the mappers via a Yarn service called MapReduce shuffler, using HTTP. PII data might be flowing so we want to secure that traffic, see:
This is a pre-step before enabling RPC encryption and Kerberos!
Things to do:
- Allow cergen to specify a java trustore password in its config, rather than use the key's password.
- Generate TLS configs and keys in the Puppet private repo via cergen
- Deploy TLS mapreduce-ssl xml configs and TLS certs to the Hadoop worker nodes - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/547491/
- Enable TLS for the mapreduce shufflers (requires node manager roll restart)