Page MenuHomePhabricator

Allow crawling of select content
Closed, DeclinedPublic


The robots.txt rules are unnecessarily restrictive. As bugzilla is being deprecated, and only a portion of its content migrated to phabricator, it's essential that we allow third parties to do their job. All crawlers, or at least ia_archiver (wayback machine), should be allowed to crawl:

  1. any content which
  2. doesn't specifically cause load issues and
  3. is not being semantically migrated to phabricator.

Ideally we'd drop requirement (3) but let's start somewhere.

Example URLs which shouldn't be blacklisted:

  • /page.cgi?id=voting/bug.html*
  • /duplicates.cgi*
  • /report.cgi* (unless load)
  • /weekly-bug-summary.cgi*
  • /describecomponents.cgi*

In fact, is there any reason not to allow everything, minus:

  • /show_bug.cgi
  • /showdependencytree.cgi
  • /query.cgi


Version: wmf-deployment
Severity: enhancement



Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:49 AM
bzimport set Reference to bz72507.
bzimport added a subscriber: Unknown Object (MLST).

Wikimedia has migrated from Bugzilla to Phabricator. Learn more about it here: - This task does not make sense anymore in the concept of Phabricator, hence closing as declined.