**Objectives**
[] Airflow job is implemented that transforms data from the mediawiki.page-content-change stream to Apache Iceberg tables
[] Write the required XML files from Iceberg to dumps.wikimedia.org with at most a 2 day delay.
**Proposed High Level Architecture**
{F38950232}
**Dependencies:**
- mediawiki.page_content_change (expected go live end of March 2023)
**Expected Sub Tasks:**
[x] Document approach (why iceberg, why Airflow, etc) -> https://phabricator.wikimedia.org/T335859
[x] Build hourly Airflow job to transform data from mediawiki.page-content-change to Iceberg tables -> https://phabricator.wikimedia.org/T335860
[x] Spike on understanding how the XML transformation happens -> https://phabricator.wikimedia.org/T335861
[] Build daily Airflow job to generate XML files from Iceberg tables -> https://phabricator.wikimedia.org/T335862
[] Tests/QA checks that content is the same as existing dumps (part of https://phabricator.wikimedia.org/T335862)
**Out of scope:**
- Any jobs to reconcile missed events in the mediawiki.page-content-change stream. As part of QA/testing we will analyze the results to see if drift is an issue