Page MenuHomePhabricator

[SPIKE] Use Flink for batch backfilling
Closed, DeclinedPublic

Description

User Story
As a platform engineer, I need to experiment with developing a Flink batch job, ideally using same / similar code as a realtime streaming job. (TBD - can this by in PyFlink?)
Why?
  • This will help us understand if we can use a bounded Flink job for backfilling datasets and also help inform us if this approach would be easy enough for others who want to analyze larger datasets using this approach
Done is:
  • Job is set to consume page change stream with start and end bounds (for some small arbitrary timeframe - last 2 days?)
  • Job returns a count of all events in that bounded timeframe
  • Job ends when all events are consumed
  • Short demo video of job running

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ottomata renamed this task from [SPIKE] Use Flink to develop bounded service to [SPIKE] Use Flink for batch backfilling.Nov 30 2022, 2:59 PM
Ottomata updated the task description. (Show Details)

no work on this, and should be possible if we ever need to do this.