[Search Update Pipeline] avoid duplicate updates (multi DC)
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	pfischer
	Aug 16 2023, 2:56 PM

Description

This is about dealing with the fact that multiple instances of the Search Update Pipeline run in different DCs. As of now, each DC has its own kafka-main cluster, and elasticsearch cluster. However, kafka messages are replicated between DCs by MirrorMaker. This leads to duplicate events:

Topic codfw.page_chage contains a message (codfw_page_change_0) which is processed by the codfw aggregator and results in another message (codfw_cirrus_update_0) on topic codfw.cirrus_update
1. codfw_page_change_0 is replicated to eqiad ad codfw_page_change_0' and processed by the eqiad aggregator that produces eqiad_cirrus_update_0 on topic eqiad.cirrus_update
2. codfw_cirrus_update_0 is replicated to eqiad as codfw_cirrus_update_0'
3. eqiad_cirrus_update_0 is replicated to codfw as equiad_cirrus_update_0'
As a result each indexer sees two messages for the same event (page 0 changed)

As discussed in T341625, we use the following approach (decision record):

Primary events (anything consumed by the first step, alias aggregator) are only consumed from local (non-replicated) topics.
Update events (those produced by the aggregator) are consumed from both, local and replicated topics.

Details

	Title	Reference	Author	Source Branch	Dest Branch
	Provide filters for input topics	repos/search-platform/cirrus-streaming-updater!22	ebernhardson	work/ebernhardson/stream-topic-filter	main

Customize query in GitLab

Related Objects
Search...

Status	Assigned	Task
Open	None	T317045 [Epic] Re-architect the Search Update Pipeline
Resolved	EBernhardson	T344357 [Search Update Pipeline] avoid duplicate updates (multi DC)
Resolved	Gehel	T341625 Requesting permission to use kafka-main cluster to transport CirrusSearch updates

Event Timeline

pfischer created this task.Aug 16 2023, 2:56 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2023, 2:56 PM

Gehel triaged this task as High priority.Aug 21 2023, 3:10 PM

bking subscribed.Aug 21 2023, 3:10 PM

Gehel moved this task from needs triage to Current work on the Discovery-Search board.Aug 21 2023, 3:10 PM

Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.

Gehel added a parent task: T317045: [Epic] Re-architect the Search Update Pipeline.

Gehel added a subtask: T341625: Requesting permission to use kafka-main cluster to transport CirrusSearch updates.

pfischer updated the task description. (Show Details)Aug 24 2023, 12:32 PM

pfischer updated the task description. (Show Details)

Gehel updated the task description. (Show Details)Sep 4 2023, 3:34 PM

Gehel set the point value for this task to 3.

Gehel moved this task from Incoming to Ready for Dev -- SWE on the Discovery-Search (Current work) board.

pfischer claimed this task.Sep 5 2023, 11:11 AM

pfischer moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.

pfischer updated the task description. (Show Details)Sep 5 2023, 6:30 PM

pfischer removed pfischer as the assignee of this task.Sep 11 2023, 3:23 PM

pfischer moved this task from In Progress to Ready for Dev -- SWE on the Discovery-Search (Current work) board.

Gehel closed subtask T341625: Requesting permission to use kafka-main cluster to transport CirrusSearch updates as Resolved.Sep 15 2023, 9:26 AM

Not sure this ticket discription matches our current understanding. My understanding after the last meeting is that:

Topic codfw.page_chage contains a message (codfw_page_change_0) which is processed by the codfw aggregator and results in another message (codfw_cirrus_update_0) on topic codfw.cirrus_update
Topic codfw.page_change is only read by the codfw aggregator. The aggregators only subscribe to events generated in their own DC
Topic codfw.cirrus_update is replicated to eqiad and codfw_page_change_0 is processed by the search-eqiad and cloudelastic-eqiad indexer

Reviewing our current implementation, what needs to change is to add a generic ability to filter the topics that are subscribed to for each stream. Talked about it in our wed meeting, the general implementation plan is to have arguments of, for example, --page-change-stream and --page-change-stream-filter (or maybe --page-change-stream.filter?).

EBernhardson claimed this task.Sep 15 2023, 6:30 PM

EBernhardson moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.

pfischer updated the task description. (Show Details)Sep 18 2023, 2:51 PM

@EBernhardson, I updated description and would opt for --page-change-stream-topic-pattern.