Page MenuHomePhabricator

Google (and others) is indexing data dumps
Closed, DeclinedPublic

Description

Probably the absense of a robots.txt file at download.wikimedia.org is a expected behaviour, but it is resulting in bandwidth waste: Google is indexing the .gz files, Yahoo the .xml ones

http://www.google.com/search?q=some+site%3Adownload.wikimedia.org
http://search.yahoo.com/search?p=some+site%3Adownload.wikimedia.org


Version: unspecified
Severity: enhancement
URL: http://download.wikimedia.org/robots.txt

Details

Reference
bz11720

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:57 PM
bzimport set Reference to bz11720.

Tomasz, this may or may not be something we want to adjust. :)