User Story
As a platform engineer, I need to design, implement and deploy a streaming job that produces event streams of mediawiki page changes with raw content.
The service willl:
- Call MW API to get the wikitext for the article
- Format the input stream data and wikitext into the new topic format
- Output the formatted data to a new Kafka topic
Expected Spikes:
- Data modeling exercise for new consolidated stream - T308017
Why are we doing this?
- Simplify event stream consumption. Consumers can listen to a single stream that represent the state of a page rather than a page action (current design)
- Adding content to streams to make them usable by consumers without having to enrich themselves
What is needed for GA internal release
- T341096: mediawiki-event-enrichment taskmanager crashes at startup - Blocker
- T340059: Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments (we need to be able to deploy to staging) - Needs SRE
- T309699: [Event Platform] Understand, document, and implement error handling and retry logic when fetching data from the MW api
- T338169: mw-page-content-change-enrich should partition by and process by wiki_id,page_id
- T338233: mw-page-content-change-enrich should enable HA with k8s ConfigMaps
- T340831: Provide basic data quality metrics for page_content_change
- Alerting on SLIs (uptime, latency, and maybe quality/consistency?) T340666 (and T329070?) - Needs SRE
- Rename and release stream as mediawiki.page_content_change.v1
- Announcement
Follow up work that needs to be done
- T341277: mediawiki page_content_change should generate new meta.id field
- T331283: [Event Platform] [NEEDS GROOMING] Store Flink HA metadata in Zookeeper - Needs SRE
- T338231: [Event Platform] mw-page-content-change-enrich should (re)produce kafka keys
- Alternative to thanos-swift for storing Flink state: MOSS? T324660: Install Ceph Cluster for Data Engineering, DSE Ceph? T324660: Install Ceph Cluster for Data Engineering - Needs SRE
- Multi DC Kafka for page_content_change and other large event size streams: T340492: [Epic] Set up multi DC Kafka stretch cluster? - Needs SRE
- T345805: [Event Platform] Enable snappy compression for Flink Kafka producers
- T345806: [Event Platform] mediawiki.page_content_change.v1 topic should be partitioned. - Needs SRE