Page MenuHomePhabricator

Production-level file export (aka dump) of MW Content in XML
Open, In Progress, HighPublic

Description

On T335862: Implement job to generate Dump XML files, we developed PoC code to dump wiki content in XML form from data lake table wmf_content.mediawiki_content_history.

We now want to get this code to production level with a set of tasks aimed at hardening, testing, and integrating this mechanism.

This is done to support FY2025 Q1[[ https://app.asana.com/1/3758245663860/project/1210776716741007/overview/1210776805319899 | SDS 1.2.1 ]]:

If we migrate the XML Dumps process from the current 'Dumps 1' infrastructure to a data pipeline that leverages the MediaWiki Content Pipelines we will be able to guarantee SLOs and turn off the 'Dumps 1'-based XML export.

Related Objects

StatusSubtypeAssignedTask
OpenNone
In Progressxcollazo
Resolvedpfischer
Resolvedpfischer
Resolvedpfischer
OpenNone
Resolvedxcollazo
DeclinedBTullis
DuplicateNone
ResolvedAntoine_Quhen
Resolvedxcollazo
OpenNone
OpenNone
Resolvedxcollazo
Openxcollazo
Resolvedxcollazo
Resolvedxcollazo
Resolvedxcollazo
Resolvedxcollazo
ResolvedAntoine_Quhen
Resolvedxcollazo
Resolvedxcollazo
Resolvedxcollazo

Event Timeline

xcollazo renamed this task from Productionization of code to dump in XML to Production-level file export (aka dump) of MW Content in XML.Jul 25 2025, 5:51 PM
xcollazo changed the task status from Open to In Progress.
xcollazo claimed this task.
xcollazo triaged this task as High priority.
xcollazo edited projects, added Data-Engineering; removed Data-Engineering-Roadmap.
xcollazo updated the task description. (Show Details)
xcollazo updated the task description. (Show Details)
xcollazo removed a subscriber: Aklapper.
xcollazo changed the status of subtask Restricted Task from Open to In Progress.Jul 29 2025, 1:24 PM
Antoine_Quhen changed the status of subtask Restricted Task from Open to In Progress.Sep 15 2025, 3:00 PM
xcollazo closed subtask Restricted Task as Resolved.Nov 17 2025, 3:12 PM