Page MenuHomePhabricator

Machine readable interface for dumps.wikimedia.org
Closed, ResolvedPublic

Description

dumps.wikimedia.org should have a machine readable interface for listing all sites, all types of dumps for a given site, and all versions of a given dump. This "web API" could just consist of static JSON files. An index.json in each directory, with file names and descriptions, would be a good start.

Feature wish list: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improve_dumps#Dump_Organization_and_Discovery

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a subscriber: daniel.
daniel set Security to None.
Hydriz added a subscriber: Hydriz.

The Dumps project on Wikimedia Labs is currently using such an API internally in automating the process of archiving to the Internet Archive. I am currently working on exposing this API to the web, so do voice out what you wish to have in this API, thanks!

Addshore triaged this task as Medium priority.Oct 26 2015, 11:48 AM
Addshore awarded a token.
Addshore added a subscriber: Addshore.

There's a dumpruninfo.json, dumpstatus.json and report.json produced for each wiki and date as dumps run; is this sufficient for folks' needs?

ArielGlenn claimed this task.

I'm going to close this as done, given the existence of these json files for some time (years!)