Page MenuHomePhabricator

New API to give latest dump file release
Closed, InvalidPublic

Description

It is beyond what the API currently does but it's also on topic and might be worthwhile.

Currently to programatically check whether a new dump file is available you need to resort to screen scraping. To do so from JavaScript is even tricker because of the cross-domain policy.

The minimum requirement of the API would just return the ASCII date of the most recent fully successful dump for the wiki. For example the dump file I'm currently using for en.wiktionary is "20080613".


Version: unspecified
Severity: enhancement
URL: http://download.wikipedia.org/enwikipedia/latest/

Details

Reference
bz14584

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:13 PM
bzimport set Reference to bz14584.

I assume this is about http://download.wikipedia.org/backup-index.html ? In my opinion, download.wikipedia.org itself is responsible for offering a proper bot interface. There's no communication from enwiktionary to download.wikipedia.org, only the latter grabbing data from the former and tarballing it, so there's not really a way for enwiktionary to know when it was last successfully dumped (other than screenscraping behind the scenes, which would just move the problem).

Moving this bug to the Wikimedia product, as it's for Wikimedia's dump facility to address, not the MediaWiki software itself.

Use the RSS feeds for this; they're in the latest dirs alongside the symlinks.

Adding URL to the "latest" dir for enwikipedia since they are not publicized or linked to from anywhere and thus very hard to find.