Page MenuHomePhabricator

[SPIKE] Build simple stateless service using PyFlink
Closed, ResolvedPublicSpike

Description

User Story
As a platform engineer, I need to try and build a simple stateless service that takes an input stream, transforms, enriches and produces an output using PyFlink

The service should:

  • Listen to mediawiki.revision-create or another existing Kafka topic
  • Make a call to MW Action API
  • Produce some output that combines the data
Why?
  • We need to assess if this is a good abstraction for event driven data producers to create similar services easily
Done is:
  • Ticket contains write up of the process (links to repos)
  • Ticket contains the pros and cons of using PyFlink so that the team can make a decision on how to proceed

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptSep 28 2022, 7:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Here's the repo with example datastream and table equivalent. It reads from the mediawiki.page-create and then uses its page_id to fetch the list of images on the page from the action api. I'm also working on a summary writeup of what I've experienced