Page MenuHomePhabricator

Open Bugzilla to search spiders
Closed, DeclinedPublic

Description

Currently we have a blanket Disallow in bugzilla's robots.txt. This is a bit rude, as it makes it harder to track down MediaWiki bugs.

It might be nice to allow at least plain bug views and let them get indexed.


Version: unspecified
Severity: enhancement
URL: https://bugzilla.wikimedia.org/robots.txt
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=33406

Details

Reference
bz13881

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:14 PM
bzimport set Reference to bz13881.
bzimport added a subscriber: Unknown Object (MLST).

Bulk-assigning open BZ issues to Fred.

fvassard wrote:

Added the following:
Allow: /show_bug.cgi
to robots.txt.

There is a slight chance that this might cause additional load, so I will be monitoring the webserver to make sure that there is no noticeable performance hit.

I believe the change above should do the trick, but time will tell.

This isn't fixed. When I search for MediaWiki error messages in Google or Bing, I *never* get results from bugzilla.wikimedia.org. For example, search for

notice Uncommitted DB writes transaction from DatabaseBase::query MessageBlobStore::clear

All I get is useless osdir.com and gmane rehashes of mail threads involving bug 56269. The bug should be the first result.

I'm pretty sure it's because in https://bugzilla.wikimedia.org/robots.txt, the line

Disallow: /*.cgi

blocks any .cgi URL, including https://bugzilla.wikimedia.org/show_bug.cgi?id=56269 . Even though

Allow: /*show_bug.cgi

comes later, "The first match found is used." The fix is to move the Allow line first, compare with Mozilla's https://bugzilla.mozilla.org/robots.txt.

Is this file [1] the one needing to be changed? i.e. is it the file used to generate https://bugzilla.mozilla.org/robots.txt, or is it just a random copy sitting in Git?

[1] https://git.wikimedia.org/blob/wikimedia%2Fbugzilla%2Fmodifications.git/master/extensions%2FSitemap%2Frobots.txt

No that's not the file. /extensions/Sitemap will be killed soon (bug 33406).
No idea if robots.txt is in Git. If it is, it's somewhere in operations/puppet/modules/bugzilla

It doesn't seem to be in that repo. Per "you still have to copy upstream bugzilla itself to the bugzilla path and clone our modifications from the wikimedia/bugzilla/modifcations repo" [1] I guess it is not in Git, similar to the rest of the BZ server-side files.

This needs shell/ops then.

[1] http://git.wikimedia.org/blob/operations%2Fpuppet.git/18f755cfecf9abdd23a0678e82f278188e059379/modules%2Fbugzilla%2FREADME.md

(In reply to spage from comment #4)

This isn't fixed.

It worked for a while, it's a recent regression. No idea when the robots.txt block was reintroduced.

We could add this file to puppet but it's low priority.

Wikimedia has migrated from Bugzilla to Phabricator. Learn more about it here: https://www.mediawiki.org/wiki/Phabricator/versus_Bugzilla - This task does not make sense anymore in the concept of Phabricator, hence closing as declined.