[Epic] Set up multi DC Kafka stretch cluster
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Ottomata
	Jun 26 2023, 8:39 PM

Description

In T314160: Q1:rack/setup/install kafka-stretch200[12], we received and racked hardware for a multi DC Kafka stretch cluster.

T307944: Evaluate Kafka Stretch cluster potential, and if possible, request hardware ASAP capture much of the info needed to set this up. This task will track and serve as a parent task for the remaining work needed.

Discovery-Search may want to use this for T317045: [Epic] Re-architect the Search Update Pipeline, as it will be a way to produce large events in a multi DC Kafka cluster without messing with Kafka main.

T307944: Evaluate Kafka Stretch cluster potential, and if possible, request hardware ASAP
T314160: Q1:rack/setup/install kafka-stretch200[12]
New debian package for latest Kafka version (T300102 ?)
Node (ganeti?) in tiebreaker DC (ulsfo)
Puppetization for new Kafka cluster (probably requires some refactoring)
T282887: Avoid accepting Kafka messages with whacky timestamps
Kafka MirrorMaker (likely 1, unless we want to set up and use MirrorMaker 2 as well) instance to mirror from new Kafka stretch cluster to Kafka jumbo
Gobblin job to ingest into HDFS
... ?

There will be decisions to make along the way, e.g.

Name of new cluster? is stretch sufficient? Might be better to come up with something that doesn't refer to the replication/architecture
topic prefix naming convention?
- How does this interact with[[ https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Hadoop_Event_Ingestion_Lifecycle | Hadoop Ingestion ]]?
...?

Related Objects
Search...

Status	Assigned	Task
Resolved	Ottomata	T185233 Modern Event Platform
Resolved	lbowmaker	T306797 [Shared Event Platform] Investigate Stream Processing Platforms
Resolved	Ottomata	T307944 Evaluate Kafka Stretch cluster potential, and if possible, request hardware ASAP
Declined	None	T340492 [Epic] Set up multi DC Kafka stretch cluster

Event Timeline

Ottomata created this task.Jun 26 2023, 8:39 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 26 2023, 8:39 PM

A very cool name could be "Octopus" (tentacles spreading in multiple dcs at once).

JArguello-WMF moved this task from Incoming (new tickets) to Event Platform Backlog on the Data-Engineering board.Jun 29 2023, 10:23 PM

JArguello-WMF removed a project: Data-Platform-SRE.Jun 29 2023, 10:56 PM

JArguello-WMF added a project: Data Engineering and Event Platform Team.Jun 30 2023, 4:28 PM

JArguello-WMF moved this task from Data Eng Backlog to Event Platform Backlog on the Data Engineering and Event Platform Team board.Jun 30 2023, 4:38 PM

Ottomata mentioned this in T307959: [Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content.Jun 30 2023, 4:53 PM

dcausse moved this task from needs triage to watching / waiting on the Discovery-Search board.Jul 3 2023, 3:12 PM

BTullis added a project: Data-Platform-SRE.Jul 15 2023, 12:02 AM

Gehel renamed this task from Set up multi DC Kafka stretch cluster to [Epic] Set up multi DC Kafka stretch cluster.Oct 18 2023, 8:49 AM

Gehel triaged this task as Medium priority.

Gehel added a project: Epic.

Gehel moved this task from Incoming to Epics on the Data-Platform-SRE board.

lbowmaker removed a project: Data Engineering and Event Platform Team.Nov 10 2023, 2:29 PM

dr0ptp4kt subscribed.Nov 16 2023, 3:29 PM

After a discussion with @Gehel and @dcausse, there isn't a lot of interest in using Kafka stretch to enable active/passive double compute streaming. The goal was to have computed output be consistent, but the benefits of this don't outweigh the work required to get this to work (including doing manual failovers of streaming apps), at least for now.

The 4 servers (2 in each DC), can be repurposed for something else. CC @BTullis @dcausse @brouberol @bking

I think that we should:

repurpose kafka-stretch100[1-2] to add them to the analytics Hadoop cluster in eqiad (unless anyone has any better ideas for these).
repurpose kafka-stretch200[1-2] to create a dse-k8s Kubernetes cluster starting with two worker nodes in codfw.

@Gehel - what do you think?

I think @dcausse was hoping for a new Multi DC Kafka cluster that was not kafka main. One on which he could do fancier things (like topic compaction) without having to risk a MW outage (taking down MW Job queue, for instance).

[Epic] Set up multi DC Kafka stretch clusterClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[Epic] Set up multi DC Kafka stretch cluster
Closed, DeclinedPublic
Actions

Related Objects
Search...