Page MenuHomePhabricator

Robots.txt broken on Toolforge.org
Closed, DuplicatePublic

Description

The robots.txt file on Toolforge should be updated to disallow indexing on the subdomains rather than the subpages as the current file disallows subpages of admin.toolforge.org instead but don't subdomains require a separate robots.txt file in order to disallow pages/set crawl delay

Event Timeline

I thought we already had a bug for this, but I can't find it. The shared /favicon.ico and /robots.txt functionality from tools.wmflabs.org has not been ported over to *.toolforge.org at all. It would be nice to make some kind of fix for this.

As @Nintendofan885 points out, the new robots.txt should work differently as each hostname will need its own rules. One option could be to serve a /robots.txt that blocks all crawlers by default and then let individual tools opt into being crawled by serving their own /robots.txt content. This would let us easily close T127206: provide a more strict robots.txt at Tool Labs as well.