Page MenuHomePhabricator

[Shared Event Platform] Design and Implement POC Flink Service to consolidate a page-change stream on top of existing EventBus streams
Closed, DeclinedPublic

Description

User Story
As a platform engineer, I want to create a stream assembling all known streams produced by MW EventBus to reflect the changes to the state of a page in a single and easily consumable stream.

As of today EventBus emits events related to the actions made to a page into different streams. The nature of our multi-DC kafka setup also makes use of different streams (kafka topics) that may route the same logical stream into different kafka cluster and thus different topics (e.g. MW DC switch, switching the event-gate dns discovery record).
Reading multiple streams to capture state changes of a MW page is far from trivial:

  • with different streams the events are likely to be processed in a unpredictable order forcing consumers to use a relatively big state or to attempt to re-order events to avoid applying incoherent changes (i.e. re-import a revision on a page whose delete event was processed just before).
  • streams that can become idle are challenging because you never know if you have to wait for possible incoming data or continue processing other streams (if attempting to do some event-time re-ordering)

If we were to design this from scratch we would probably implement a single stream holding all possible state changes related to a page. Doing this today would require EventBus to emit such events to a single stream but also relying on the upcoming stretch kafka stretch cluster (T307944).
Changing EventBus might be relatively easy but putting in place the kafka stretch cluster is something that might take time. We might have an opportunity here to use a minimal stream processor to mimic what it would look like to have such stream in place and evaluate what benefits it could offer:

  • a single stream to consume: predictable and repeatable ingestion
  • ordered: no need to hold any state on the consumer to determine whether or not an event is safe to apply or has been superseded by a previously seen event

The service must

  • Listen to:
    • mediawiki.revision-create (should include page-create)
    • mediawiki.page-delete
    • mediawiki.page-suppress (similar to page-delete but for "suppressed deletes")
    • mediawiki.page-undelete
    • others?
  • Re-order: assuming a partitioning scheme based on the wiki+page_id, re-order events based on:
    • revision_id
    • event time (for delete v.s. undelete which unfortunately do not have any other ways to sort by)
  • Format all these streams into a single consolidated stream
  • Output the resulting events to kafka

Done is

Service deployed running on POC instance of Flink in YARN, producing to the kafka test cluster in eqiad. (no SLO’s)

Expected Spikes:
  • Data modeling exercise for new consolidated stream
Why are we doing this?
  • Simplify event stream consumption. Consumers can listen to a single & well ordered stream that represents the state of a page (minus the content) rather than a page action (current design).

Event Timeline

dcausse renamed this task from [Shared Event Platform] Design and Implement POC Flink Service to consolidate a page-change streams on top of existing EventBus streams to [Shared Event Platform] Design and Implement POC Flink Service to consolidate a page-change stream on top of existing EventBus streams.Jun 2 2022, 1:34 PM
dcausse updated the task description. (Show Details)

@dcausse I remember you mentioned having trouble with watermarks when doing table -> datastream conversion? Just re-read this and it has examples on how to do it!

such events will be created from MW, having them properly ordered can be done later with simple stream processor depending on specific consumer needs.