This is a parent task to capture efforts related to development and operation of Python based stateless streaming services.
Background
This Epic was informed by the following SPIKEs;
- Stream Processing Framework Evaluations
- Build simple stateless service using PyFlink.
- Evaluate a pyflink version of Mediawiki Stream Enrichment.
Goals
This Epic spans the following tasks.
- Flink wrappers and helper libraries should be moved into a dedicated git repo with packaging and CI. https://phabricator.wikimedia.org/T324746
- Flink wrappers and helper libraries should integrate with Table API. We should allow injection of UDFs (ideally cross language). https://phabricator.wikimedia.org/T324953
- We should provide scaffolding to bootstrap Python based services.
- We should provide utilities for local experimentation and unit testing. For instance, I would like to be able to inject mocked Sources/Sink and operate with local json files before rolling out to YARN. https://phabricator.wikimedia.org/T324951
- We should streamline packaging of pyflink applications, and ideally integrate with the shared Flink docker images
- Sideoutput error reporting should be made composable and more robust.
- Metrics and monitoring should be standardized.
- Deployment should be standardized using WMFs Deployment Pipeline.
Done is:
- Implement 'production' version of the Media Wiki Enrichment service in PyFlink using the utilities and capabilities implemented as part of this Epic ticket - running on YARN
- Java/Scala implementation of the enrichment service is archived/switched off