Flink job to enrich reconciliation events
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	xcollazo
	Fri, Jun 28, 8:07 PM

Description

On T368782: MediaWiki Reconciliation API, we will be emitting a new kind of 'reconciliation' event. Hopefully the schema of such event is exactly the same as the page change event.

In this task, we should create a Flink job that will:

Consume this new event stream
Create an enriched version of that includes the content slots, hopefully exactly the same as mediawiki_page_content_change_v1 (could not find schema?).
A separete Gobblin process should make this stream available as a Hive Table under the event Hive database.

Open questions:

Presumably we should be able to reuse all the work done already for mediawiki_page_content_change_v1?

Related Objects
Search...

Status	Assigned	Task
Invalid	VirginiaPoundstone	T345988 [Epic] XML MediaWiki data dumps for right to fork
Open	xcollazo	T358877 Dumps 2.0 Phase II: Production intermediate table milestone
Open	None	T358373 [Dumps 2] Reconcillation mechanism to detect and fetch missing/mismatched revisions
Open	None	T368745 MediaWiki reconciliation API and event enrichment pipeline
Open	None	T368787 Flink job to enrich reconciliation events

Event Timeline

@gmodena and @Ottomata the description above is just me thinking out loud. Kindly please modify as you see fit.

Presumably we should be able to reuse all the work done already for mediawiki_page_content_change_v1?

We _might_ be able to do this in the existent mw-page-content-change-enrich job, but it won't be as straightforward as a making a new enrichment job. Having these in the same job would be a nice general pattern to support though. We should at least look into it to see how difficult it will be.

Flink job to enrich reconciliation eventsOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Flink job to enrich reconciliation events
Open, Needs TriagePublic
Actions

Related Objects
Search...