Page MenuHomePhabricator

Create "so you want to push new data into Hive" tutorial
Closed, ResolvedPublic

Description

Create a tutorial page on wikitech outlining the process of getting data from MediaWiki to Hive via Kafaka+Camus+Oozie and whatever else is needed.

Event Timeline

bd808 claimed this task.
bd808 raised the priority of this task from to Medium.
bd808 updated the task description. (Show Details)
bd808 added a subscriber: bd808.
  1. Get access to stat1002 (T115548) so you can try things out
  2. Read about Hive on wikitech (SQL-like abstraction over Hadoop map-reduce jobs)
  3. Read about Oozie on wikitech (job scheduler)
  4. ...

I added a few things to https://wikitech.wikimedia.org/wiki/Analytics/Cluster/MediaWiki_Avro_Logging as I worked on T108618 and validated the existing data.