After having tested that Gobblin works with Kafka using TLS encryption (Yay!), we decided to move with this tool.
This task is the parent task for the various smaller bits we'll have to do.
Things to do:
- Make Gobblin work in MapReduce or Yarn mode (PR upstream?)
- Make Gobblin read JSON and write topic+hourly partitions
- Package Gobblin for WMF (SRE - HELP please :)
- Migrate oozie jobs to gobblin - https://etherpad.wikimedia.org/p/gobblin
- Migrate refine jobs to gobblin - https://etherpad.wikimedia.org/p/gobblin
- clean up
We have ideas of improvements that Gobblin could help us with (webrequest stats for instance), but we're gonna keep the work to just replacing camus for now.
We'll create new tasks with improvements as our knowledge of Gobblin levels up.