Once events are in Kafka, we need standardized way to import them into downstream systems. This task will describe and track that work. This task is not about a Stream Processing system, which would consume events from Kafka, transform them, and then produce them back to Kafka. This is about consume events out of Kafka.
Currently, downstream systems consume events from Kafka using custom consumers and glue code. EventLogging kafka + jrm, Camus, Refinery Spark 'Refine' code, statsv, kafkatee etc. are all examples of custom 'downstream connectors' we use at WMF.
A first use case of this would be to replace Camus, the job that imports JSON data from Kafka into HDFS. By using Kafka Connect with our JSONSchemas here, we avoid schema bugs like T214384, and write Parquet files directly, which would improve refine job performance in Hadoop.