Page MenuHomePhabricator

[Event Platform] Implement PoC Event-Driven Data Pipeline for Revert Risk Model Scores using Event Platform Capabilities
Open, Needs TriagePublic

Description

User Story

As an ML Platform Engineer, I want to utilize the platform capabilities developed by the Event Platform team to thoroughly test and evaluate their functionality. This will enable me to effectively implement an event-driven data pipeline that generates an event stream containing revert risk model scores.
Why?

By implementing a proof of concept job utilizing the capabilities of the Event Platform team, we can gain insights into the advantages and disadvantages of using Flink in comparison to other available solutions like ChangeProp and Benthos.

Additionally, the output stream generated from this implementation is also of interest to various product teams. Currently, these teams rely on individual calls to the LiftWing API to access the required data. However, by implementing an event stream, we can enable consumers to subscribe to the stream and reduce the number of API calls made to LiftWing, improving efficiency and reducing dependencies. In the long run we could also look at connecting the stream to something like Cassandra to better serve the data.

Expected Sub-tasks (not exhaustive - please add as needed)
  • Deploy Flink operator to dse-k8s
  • Build Python Flink job that listens to mw.page_change, makes an API call to LiftWing for the revert risk model score and outputs the results in a new stream
  • Design schema for output topic
  • Deploy new output stream
  • Deploy Flink job to dse-k8s
Success Criteria
  • Flink job is running as a PoC on dse-k8s and is able to enrich relevant page change events with a revert risk model score. Following the proof of concept we will review the process and can work together to understand event platform improvements, steps to move from PoC to more formal implementation, etc.
Useful Links (please add as needed)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@AikoChou - we have been talking with the Enterprise engineering team recently about implementing this (also explaining the various platform components we built).

I will include you on our upcoming meetings and we can discuss how we can move forward.

Hi @lbowmaker, thanks for the heads up. Unfortunately I won't be able to attend the meeting today as it conflicts with my other meetings. However, I would like to follow up on the discussion that you and the Enterprise team will have. Could you please share any documents or notes from the meeting afterwards? I would greatly appreciate it! :)

achou removed achou as the assignee of this task.Sep 25 2023, 11:15 AM
achou updated the task description. (Show Details)

After discussion with teams, it turns out the revert-risk poc stream is more suitable to be deployed on dse-k8s. We will work on a SLO for the application with the Enterprise team and Event Platform team to iron out some of the details and maintainership line.

Ahoelzl renamed this task from Implement PoC Event-Driven Data Pipeline for Revert Risk Model Scores using Event Platform Capabilities to [Event Platform] Implement PoC Event-Driven Data Pipeline for Revert Risk Model Scores using Event Platform Capabilities.Oct 20 2023, 4:59 PM