User Story
As a platform engineer, I need to experiment with developing a Flink batch job, ideally using same / similar code as a realtime streaming job. (TBD - can this by in PyFlink?)
Why?
- This will help us understand if we can use a bounded Flink job for backfilling datasets and also help inform us if this approach would be easy enough for others who want to analyze larger datasets using this approach
Done is:
- Job is set to consume page change stream with start and end bounds (for some small arbitrary timeframe - last 2 days?)
- Job returns a count of all events in that bounded timeframe
- Job ends when all events are consumed
- Short demo video of job running