Event Platform: Stream Connectors
Open, MediumPublic0 Estimated Story Points
Actions

Assigned To

None

Authored By

	Ottomata
	Jan 22 2019, 7:08 PM

Description

Once events are in Kafka, we need standardized way to import them into downstream systems. This task will describe and track that work. This task is not about a Stream Processing system, which would consume events from Kafka, transform them, and then produce them back to Kafka. This is about consuming events out of Kafka and saving them into a storage system.

Currently, downstream systems consume events from Kafka using custom consumers and glue code. EventLogging kafka + jrm, Camus, Refinery Spark 'Refine' code, statsv, kafkatee etc. are all examples of custom 'downstream connectors' we use at WMF.

We'd like to standardize the way this is done.

Using Kafka Connect could be nice, but in 2018, most of the useful Confluent connector implementations where switched to a non FLOSS license.

Flink has built in connectors. These might also be useful to standardize on outside of a streaming context.

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		Ottomata	T185233 Modern Event Platform
Open		None	T214430 Event Platform: Stream Connectors
Duplicate		Ottomata	T223626 Kafka Connect development work
Resolved		JAllemandou	T238400 Evaluate possible replacements for Camus: Gobblin, Marmaray, Kafka Connect HDFS, etc.
Duplicate		Ottomata	T223628 Replace Camus with Kafka Connect for event data imports
Resolved		tchin	T306627 Integrate Image Suggestions Feedback with Cassandra
Resolved	Spike	tchin	T313628 Define how to authenticate with Cassandra and test Flink POC

Event Timeline

Ottomata triaged this task as Medium priority.Jan 22 2019, 7:08 PM

Ottomata created this task.

Ottomata edited projects, added Analytics; removed Analytics-Kanban.Feb 21 2019, 5:57 PM

Ottomata moved this task from Incoming to Event Platform on the Analytics board.

• Mholloway unsubscribed.Feb 21 2019, 5:59 PM

Ottomata moved this task from Backlog to Parent Tasks/Epics on the Event-Platform board.Mar 25 2019, 6:36 PM

CCicalese_WMF removed a project: Platform Team Legacy (Watching / External).Jul 26 2019, 5:40 PM

WDoranWMF moved this task from Modern Event Platform (TEC2) to mop on the Platform Engineering board.Jul 26 2019, 6:31 PM

WDoranWMF edited projects, added Core Platform Team Initiatives (Modern Event Platform (TEC2)); removed Platform Engineering (Modern Event Platform (TEC2)).

CCicalese_WMF removed a project: Core Platform Team Initiatives (Modern Event Platform (TEC2)).Mar 24 2020, 10:07 PM

• Jhernandez unsubscribed.Apr 2 2020, 6:46 PM

Ottomata mentioned this in T251609: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events.May 1 2020, 3:57 PM

Ottomata added a subtask: T238400: Evaluate possible replacements for Camus: Gobblin, Marmaray, Kafka Connect HDFS, etc..May 14 2020, 2:27 PM

• fdans closed subtask T238400: Evaluate possible replacements for Camus: Gobblin, Marmaray, Kafka Connect HDFS, etc. as Resolved.Jan 25 2021, 7:01 PM

• Mholloway subscribed.Apr 29 2021, 7:20 PM

Ottomata removed Ottomata as the assignee of this task.May 26 2021, 1:23 PM

@Ottomata Can this be marked as resolved?

No, we never did it. This would have been Kafka Connect. Maybe now it would be based on Flink connectors?

We could perhaps decline this and reopen or recreate if we ever actually do it.

It feels like a major part of the Event Platform, and certainly present in new diagrams other teams are drawing up. I think it should stick around and we should collaborate on it.

I am going to decline this task as is. Let's define new stories as part of the new Event Platform work.

Ottomata mentioned this in T302925: [SPIKE] Investigate and Decide on Solution for Image Suggestions Feedback.Mar 15 2022, 1:51 PM

lbowmaker subscribed.Mar 15 2022, 1:54 PM

Restricted Application added a project: Data-Engineering. · View Herald TranscriptMar 15 2022, 1:54 PM

Ottomata renamed this task from Modern Event Platform: Stream Connectors to Event Platform: Stream Connectors.Aug 19 2022, 2:22 PM

Ottomata reopened this task as Open.

Ottomata updated the task description. (Show Details)

Ottomata added a subscriber: tchin.

T306627: Integrate Image Suggestions Feedback with Cassandra is being used to drive this task, specifically with using Flink to connect streams to Cassandra. We hope to use the work there to make this kind of integration generic.

JArguello-WMF closed subtask T306627: Integrate Image Suggestions Feedback with Cassandra as Resolved.Sep 1 2022, 12:13 PM

• EChetty edited projects, added Data-Engineering-Planning; removed Data-Engineering.Sep 6 2022, 10:42 AM

• EChetty moved this task from Backlog to Event Platform on the Data-Engineering-Planning board.Oct 14 2022, 8:39 AM

BTullis subscribed.Feb 8 2023, 3:05 PM

JArguello-WMF removed a project: Data-Engineering-Planning.Jun 29 2023, 9:50 PM

Restricted Application added a project: Data-Engineering. · View Herald TranscriptJun 29 2023, 9:50 PM

JArguello-WMF moved this task from Incoming (new tickets) to Event Platform Backlog on the Data-Engineering board.Jun 29 2023, 10:28 PM

JArguello-WMF edited projects, added Data Engineering and Event Platform Team; removed Data-Engineering.Jun 30 2023, 4:16 PM

Restricted Application added a project: Data-Engineering. · View Herald TranscriptJun 30 2023, 4:16 PM

JArguello-WMF moved this task from Data Eng Backlog to Parent Tasks/Epics on the Data Engineering and Event Platform Team board.Jun 30 2023, 4:17 PM

JArguello-WMF removed a project: Data-Engineering.Jun 30 2023, 4:19 PM

Restricted Application added a project: Data-Engineering. · View Herald TranscriptJun 30 2023, 4:19 PM

lbowmaker moved this task from Parent Tasks/Epics to Event Platform Backlog on the Data Engineering and Event Platform Team board.Oct 20 2023, 2:41 PM

lbowmaker removed a project: Data Engineering and Event Platform Team.Nov 10 2023, 2:29 PM

leila unsubscribed.Nov 10 2023, 4:50 PM

In T317045: [Epic] Re-architect the Search Update Pipeline, the search team is using the Flink ElasticSearch connector to update ElasticSearch indexes. If Event Platform had abstracted connector support for this, similar to the prototype developed by @tchin here for cassandra , they would not need to reimplement the event to ElasticSearch mapping in their code.

EBernhardson subscribed.Nov 21 2023, 7:25 PM

Event Platform: Stream ConnectorsOpen, MediumPublic0 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Event Platform: Stream Connectors
Open, MediumPublic0 Estimated Story Points
Actions

Related Objects
Search...