Data Platform Request Form
Is this a request for a:
- Dataset
- Data Pipeline
- Data Feature
Is this a change to something existing:
- Yes - please provide details of existing datasets/data pipelines (wiki links, Git URL, names of jobs, etc)
- No
Please provide the description of your request:
Our team plans to deprecate the v1 Enterprise API endpoints this Monday (3/24/25) to fully support the new v2 endpoints and remove the v1 overhead.
This will be a breaking change for the HTML dumps that generate on the 2nd and 21st of every month currently. Users can still access this content via the free version of the Enterprise APIs with an account, or via Wikimedia Cloud Services. Historical HTML dumps will continue to be available.
We need written updates on the following pages/subsections to ensure folks still have access to new content:
(1) https://dumps.wikimedia.org/ subsection: Static HTML dumps
Text (new):
A copy of all pages from all Wikipedia wikis, in HTML form.
These are currently not running, but Wikimedia Enterprise HTML dumps are provided for some wikis and the historical versions are available here. New versions can still be downloaded via the Enterprise APIs.
(2) https://dumps.wikimedia.org/other/enterprise_html/ (full page changes)
Text (new):
The partial mirrors of Wikimedia Enterprise HTML dumps were an experimental service that, as of 03/24/2025, are no longer replicated here.
If you are in need of recent runs, dumps of article change updates, or the ability to query individual articles from the dumps, visit Wikimedia Enterprise to sign up for a free account. Alternatively use your developer account to access APIs within Wikimedia Cloud Services.
Historical Archive
The historical dumps will remain available here for a specific set of namespaces and wikis for public download. Each dump output file consists of a tar.gz archive which, when uncompressed and untarred, contains one file, with a single line per article, in json format. To view the payload schema visit Wikimedia Enterprise API Data Dictionary.
Accompanying the tar.gz file is a file containing the md5sum and the date the dump was produced, also in JSON format. All files for a dump run are included in a single directory of the format YYYYMMDD.
View those directories here: other/enterprise_html/runs
Use Case: (Please briefly explain what this feature will be used for):
We need written updates that reflect the current data environment to ensure folks still have access to the content they need.
Ideal Delivery Date:
By 3/28, but the first missed update will happen on 4/2 so that is the absolute cut-off before the current language becomes obsolete.
Data Feature Checklist
Please link to the following if applicable.
| Document Type | Required? | Document/Link |
| Related PHAB Tickets | N/A | <add link here> |
| Product One Pager | N/A | <add link here> |
| Product Requirements Document (PRD) | N/A | <add link here> |
| Product Roadmap | No | <add link here> |
| Product Planning/Business Case | No | <add link here> |
| Product Brief | No | <add link here> |
| Other Links | No | <add links here> |