Page MenuHomePhabricator

Flink job to enrich reconciliation events
Open, Needs TriagePublic

Description

On T368782: MediaWiki Reconciliation API, we will be emitting a new kind of 'reconciliation' event. Hopefully the schema of such event is exactly the same as the page change event.

In this task, we should create a Flink job that will:

  • Consume this new event stream
  • Create an enriched version of that includes the content slots, hopefully exactly the same as mediawiki_page_content_change_v1 (could not find schema?).
  • A separete Gobblin process should make this stream available as a Hive Table under the event Hive database.

Open questions:

  • Presumably we should be able to reuse all the work done already for mediawiki_page_content_change_v1?

Event Timeline

@gmodena and @Ottomata the description above is just me thinking out loud. Kindly please modify as you see fit.

Presumably we should be able to reuse all the work done already for mediawiki_page_content_change_v1?

We _might_ be able to do this in the existent mw-page-content-change-enrich job, but it won't be as straightforward as a making a new enrichment job. Having these in the same job would be a nice general pattern to support though. We should at least look into it to see how difficult it will be.