MediaWiki will soon be writing to a new topic in kafka, mediawiki_CirrusSearchRequestSet. This will be formatted with apache avro and needs to flow through camus and into hadoop.
- create a new topic in Kafka (trivial)
- camus imports with Avro (should be easy but never done before)
- camus needs to know which is the timestamp field
- figure out how camus will get the schema
Schema will only be registered in the mediawiki repo.
Search team will take care of creating hive tables.