It is about time we created and maintained an official Analytics cluster in Beta. This includes CDH (Hadoop) and Kafka. It might also include varnishkafka on Beta varnishes.
There will be a little bit of refactoring in puppet that needs to happen for this, but for the most part the existing puppetization will do everything it needs to do. Building this cluster won't be too much work (maybe half a week), but maintaining it might be.
This work does not have to be done by me. I will work on any puppet tweaks needed, but I think it would be good to get more analytics engineers involved here.