Dockerize Hadoop Cluster, Druid, and Samza + Load Test
To facilitate building our Pageview API:

  • dockerize Druid, Hive, Samza, and anything else we need to simulate our production Hadoop cluster
  • pit Druid, Hive, and Samza against each other for pageview data processing
    • memory
    • storage
    • speed
  • (optional) build a dashboard on top of this

Thoughts: set up a physical network to reduce load on the WiFi?

@Qgil the whole analytics team is the owner. We'll all be there, we'll all work on this, and we'll be in sync:

OK, thank you. We are just aiming to have all the confirmed sessions assigned to someone, in order to make it easier for anybody to contact you if they have questions.

Oh I'm happy to be the point of contact, but if anyone is reading this, grab anyone on the list above if you can't find me.

We are at table 18
Table Name: Pageviews, Big Data, Analytics

What is the status of this task, now that Wikimania 2015 is over? Did this hacking project take place and was successfully finished?

We did not dockerize everything we wanted to because it turned out Docker was not as great of an ecosystem as we guessed going into the hackathon. We did finish the intent behind this task which was to closely analyze storage and processing technologies for the purpose of standing up our Pageview API