Page MenuHomePhabricator

Machine readable interface for dumps.wikimedia.org
Closed, ResolvedPublic

Description

dumps.wikimedia.org should have a machine readable interface for listing all sites, all types of dumps for a given site, and all versions of a given dump. This "web API" could just consist of static JSON files. An index.json in each directory, with file names and descriptions, would be a good start.

Feature wish list: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improve_dumps#Dump_Organization_and_Discovery

Event Timeline

daniel created this task.Mar 17 2015, 3:08 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a subscriber: daniel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2015, 3:08 PM
Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Mar 17 2015, 3:49 PM
daniel updated the task description. (Show Details)Jun 30 2015, 10:13 PM
daniel set Security to None.
Hydriz added a subscriber: Hydriz.

The Dumps project on Wikimedia Labs is currently using such an API internally in automating the process of archiving to the Internet Archive. I am currently working on exposing this API to the web, so do voice out what you wish to have in this API, thanks!

Addshore triaged this task as Normal priority.Oct 26 2015, 11:48 AM
Addshore awarded a token.
Addshore added a subscriber: Addshore.

There's a dumpruninfo.json, dumpstatus.json and report.json produced for each wiki and date as dumps run; is this sufficient for folks' needs?

Sounds good to me!

JAllemandou added a subscriber: JAllemandou.
ArielGlenn closed this task as Resolved.May 15 2019, 4:33 AM
ArielGlenn claimed this task.

I'm going to close this as done, given the existence of these json files for some time (years!)

ArielGlenn moved this task from Incoming to Done on the Datasets-Archiving board.May 15 2019, 4:34 AM