To facilitate building our Pageview API:
- dockerize Druid, Hive, Samza, and anything else we need to simulate our production Hadoop cluster
- pit Druid, Hive, and Samza against each other for pageview data processing
- memory
- storage
- speed
- (optional) build a dashboard on top of this
Thoughts: set up a physical network to reduce load on the WiFi?