Implement the ingestion job
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	• dcausse
	Nov 21 2022, 3:50 PM

Description

The ingestion job (cirrus-streaming-updater-consumer) should read messages from a kafka topic and write to an elasticsearch index.

Messages from the kafka topic should comply with the schema defined at https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/856507.
Writing to elasticsearch could be assisted with the elasticsearch connector.

The main function will be to create the bulk requests:

create a scripted update request similar to what's done in CirrusSearch for revision based updates
create delete request for page deletes.

AC:

a new flink job can be scheduled consuming a topic of update document and writing to a elasticsearch cluster
updates can filtered per-wiki based on a command line parameter (to ease testing)

Details

Subject	Repo	Branch	Lines +/-
Consume internal updates and map them to elasticsearch requests	search/cirrus-streaming-updater	master	+1 K -55
Downgrade Caffeine to 2.9.3 for compatibility with Java 8	search/cirrus-streaming-updater	master	+16 -3
Make kafka sink configurable	search/cirrus-streaming-updater	master	+614 -447
Make sure code runs on Java 1.8	search/cirrus-streaming-updater	master	+141 -68

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Gehel	T317045 [Epic] Re-architect the Search Update Pipeline
		Resolved		pfischer	T323506 Implement the ingestion job

Event Timeline

• dcausse created this task.Nov 21 2022, 3:50 PM

Restricted Application added a project: Discovery-Search. · View Herald TranscriptNov 21 2022, 3:50 PM

Gehel triaged this task as High priority.Nov 21 2022, 4:25 PM

Gehel moved this task from needs triage to Current work on the Discovery-Search board.

Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.

Gehel set the point value for this task to 8.Nov 21 2022, 4:53 PM

Gehel moved this task from Incoming to Ready for Dev -- SWE on the Discovery-Search (Current work) board.

pfischer moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.Nov 23 2022, 4:05 PM

Change 860516 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/cirrus-streaming-updater@master] Consume internal CirrusSearch updates

https://gerrit.wikimedia.org/r/860516

gerritbot added a project: Patch-For-Review.Nov 24 2022, 10:04 AM

Gehel assigned this task to pfischer.Nov 28 2022, 4:19 PM

Change 864733 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/cirrus-streaming-updater@master] Make sure code runs on Java 1.8

https://gerrit.wikimedia.org/r/864733

Change 864788 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/cirrus-streaming-updater@master] Make kafka sink configurable

https://gerrit.wikimedia.org/r/864788

Change 864733 merged by jenkins-bot:

[search/cirrus-streaming-updater@master] Make sure code runs on Java 1.8

https://gerrit.wikimedia.org/r/864733

Change 864788 merged by jenkins-bot:

[search/cirrus-streaming-updater@master] Make kafka sink configurable

https://gerrit.wikimedia.org/r/864788

Change 871227 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[search/cirrus-streaming-updater@master] Downgrade Caffeine to 2.9.3 for compatibility with Java 8

https://gerrit.wikimedia.org/r/871227

Change 871227 merged by jenkins-bot:

[search/cirrus-streaming-updater@master] Downgrade Caffeine to 2.9.3 for compatibility with Java 8

https://gerrit.wikimedia.org/r/871227

pfischer moved this task from In Progress to Needs review on the Discovery-Search (Current work) board.Jan 5 2023, 4:04 PM

Change 860516 merged by jenkins-bot:

[search/cirrus-streaming-updater@master] Consume internal updates and map them to elasticsearch requests

https://gerrit.wikimedia.org/r/860516

Maintenance_bot removed a project: Patch-For-Review.Jan 11 2023, 11:30 AM

pfischer moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Jan 12 2023, 3:01 PM

Gehel closed this task as Resolved.Jan 13 2023, 9:59 AM

Implement the ingestion jobClosed, ResolvedPublic8 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Implement the ingestion job
Closed, ResolvedPublic8 Estimated Story Points
Actions

Related Objects
Search...