Page MenuHomePhabricator

Dead links on dumps.wikimedia.org
Open, Needs TriagePublic

Description

Links on https://dumps.wikimedia.org/archive/2010/2010-03/ruwiki/20100331/ lead to wrong URLs, as they return 404, although there are working ones. For instance,
http://download.wikimedia.org/ruwiki/20100331/ruwiki-20100331-pages-articles.xml.bz2
instead of
https://dumps.wikimedia.org/archive/2010/2010-03/ruwiki/20100331/ruwiki-20100331-pages-articles.xml.bz2

I suppose other pages some levels up from that directory are affected too.

Event Timeline

Wikimedia generally doesn't retain old dumps. I think the threshold is 10 or so dumps. For example, I usually wouldn't look at anything that's not listed on https://dumps.wikimedia.org/ruwiki/ where the oldest is currently 2017-01.

Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.

@gnosygnu, it's listed on https://dumps.wikimedia.org/archive/ and there are working URLs, the problem is the page's HTML.

@Jack_who_built_the_house Yeah, you're right; I misread your post. Thanks for the correction

The index.html files were copied wholesale from the dumps as they were produced. We could either grep replace or just remove the index.html files and let people grab directly from the directory. Preferences?