Page MenuHomePhabricator

nostalgia.wikipedia.org possibly should be robots.txt'd out of search engines
Closed, ResolvedPublic

Description

Have received vague reports of nostalgia.wikipedia.org showing up unexpectedly in regular Google search results. (This holds a copy of Wikipedia's database from early 2002, displayed in the old-style 'Nostalgia' skin, and was put up for one of Wikipedia's anniversary celebrations a few years ago.)

The nostalgia site appears to be served out of the primary document root, so gets the regular robots.txt; we should possibly give it a custom docroot with a blanked Disallow robots.txt, which would phase it out of general web search indexes.


Version: unspecified
Severity: normal

Details

Reference
bz15253

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:19 PM
bzimport set Reference to bz15253.
bzimport added a subscriber: Unknown Object (MLST).

We should just redirect robots.txt to extract2.php and have them edited via the web.

Hmmm, sounds kind of scary but would probably work fine. :)

jeluf wrote:

Will robots follow redirects for robots.txt? I guess we should proxy the request to extract2.php.

jeluf wrote:

Done.

User-agent: *
Disallow: /