Create a tutorial page on wikitech outlining the process of getting data from MediaWiki to Hive via Kafaka+Camus+Oozie and whatever else is needed.
Description
Description
Related Objects
Related Objects
Event Timeline
Comment Actions
See T113521: Setup pipeline for search logs to travel through kafka and camus into hadoop {hawk} [55 pts] for some breadcrumbs about the work done for Cirrus request logging.
Comment Actions
The always awesome @EBernhardson has just done most of the heavy lifting on this! https://wikitech.wikimedia.org/wiki/Analytics/Cluster/MediaWiki_Avro_Logging
Comment Actions
I added a few things to https://wikitech.wikimedia.org/wiki/Analytics/Cluster/MediaWiki_Avro_Logging as I worked on T108618 and validated the existing data.