==== **User Story**
> ==== As a platform engineer, I need to design, implement and deploy a streaming job that produces event streams of mediawiki page changes with raw content.
==== The service willl:
* Call MW API to get the wikitext for the article
* Format the input stream data and wikitext into the new topic format
* Output the formatted data to a new Kafka topic
==== Expected Spikes:
* Data modeling exercise for new consolidated stream - T308017
==== Why are we doing this?
- Simplify event stream consumption. Consumers can listen to a single stream that represent the state of a page rather than a page action (current design)
- Adding content to streams to make them usable by consumers without having to enrich themselves
==== What is needed for GA internal release
[] {T341096} - **Blocker**
[] {T340059} (we need to be able to deploy to staging) - **Needs SRE**
[x] {T309699}
[] {T338169}
[x] {T338233}
[] Quality / consistency metrics. Might have to do this in Hive/Airflow? (Phab task https://phabricator.wikimedia.org/T340831)
[] Alerting on SLIs (uptime, latency, and maybe quality/consistency?) T340666 (and T329070?) - **Needs SRE**
[] Rename and release stream as `mediawiki.page_content_change.v1`
[] Announcement
==== Follow up work that needs to be done
[] {T331283} - **Needs SRE**
[] {T338231}
[] Alternative to thanos-swift for storing Flink state: MOSS? {T324660}, DSE Ceph? {T324660} - **Needs SRE**
[] Multi DC Kafka for page_content_change and other large event size streams: {T340492}? - **Needs SRE**