Page MenuHomePhabricator

Event Platform: Stream Connectors
Open, MediumPublic0 Estimated Story Points


Once events are in Kafka, we need standardized way to import them into downstream systems. This task will describe and track that work. This task is not about a Stream Processing system, which would consume events from Kafka, transform them, and then produce them back to Kafka. This is about consuming events out of Kafka and saving them into a storage system.

Currently, downstream systems consume events from Kafka using custom consumers and glue code. EventLogging kafka + jrm, Camus, Refinery Spark 'Refine' code, statsv, kafkatee etc. are all examples of custom 'downstream connectors' we use at WMF.

We'd like to standardize the way this is done.

Using Kafka Connect could be nice, but in 2018, most of the useful Confluent connector implementations where switched to a non FLOSS license.

Flink has built in connectors. These might also be useful to standardize on outside of a streaming context.

Event Timeline

Ottomata triaged this task as Medium priority.Jan 22 2019, 7:08 PM
Ottomata created this task.

No, we never did it. This would have been Kafka Connect. Maybe now it would be based on Flink connectors?

We could perhaps decline this and reopen or recreate if we ever actually do it.

It feels like a major part of the Event Platform, and certainly present in new diagrams other teams are drawing up. I think it should stick around and we should collaborate on it.

I am going to decline this task as is. Let's define new stories as part of the new Event Platform work.

Ottomata renamed this task from Modern Event Platform: Stream Connectors to Event Platform: Stream Connectors.Aug 19 2022, 2:22 PM
Ottomata reopened this task as Open.
Ottomata updated the task description. (Show Details)
Ottomata added a subscriber: tchin.

T306627: Integrate Image Suggestions Feedback with Cassandra is being used to drive this task, specifically with using Flink to connect streams to Cassandra. We hope to use the work there to make this kind of integration generic.

In T317045: [Epic] Re-architect the Search Update Pipeline, the search team is using the Flink ElasticSearch connector to update ElasticSearch indexes. If Event Platform had abstracted connector support for this, similar to the prototype developed by @tchin here for cassandra , they would not need to reimplement the event to ElasticSearch mapping in their code.