Page MenuHomePhabricator

Modern Event Platform: Stream Connectors
Open, MediumPublic0 Estimated Story Points

Description

Once events are in Kafka, we need standardized way to import them into downstream systems. This task will describe and track that work. This task is not about a Stream Processing system, which would consume events from Kafka, transform them, and then produce them back to Kafka. This is about consume events out of Kafka.

Currently, downstream systems consume events from Kafka using custom consumers and glue code. EventLogging kafka + jrm, Camus, Refinery Spark 'Refine' code, statsv, kafkatee etc. are all examples of custom 'downstream connectors' we use at WMF.

We'd like to standardize the way this is done, most likely using Kafka Connect. There are many open source connector plugins we can make use of.

@Ottomata has WIP prototype that converts from JSONSchemas to Connect Schemas, which allows us to use valid JSON events produced by the Stream Intake service.

A first use case of this would be to replace Camus, the job that imports JSON data from Kafka into HDFS. By using Kafka Connect with our JSONSchemas here, we avoid schema bugs like T214384, and write Parquet files directly, which would improve refine job performance in Hadoop.

Event Timeline

Ottomata triaged this task as Medium priority.Jan 22 2019, 7:08 PM
Ottomata created this task.