Page MenuHomePhabricator

Add Special:Whatlinkshere and Special:Recentchangeslinked to robots.txt
Closed, DeclinedPublic

Description

Looking through the access logs in my local wiki, I noticed that Special:Whatlinkshere and Special:Recentchangeslinked (including all their subpages) where downloaded by spiders. The files already contain the "noindex" attribute, so the spiders don't store the information, but the server still has to create them.

I propose that these pages are added as disallowed to the robots.txt file, to reduce server load and needed bandwith.


Version: unspecified
Severity: enhancement

Details

Reference
bz12841

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:01 PM
bzimport set Reference to bz12841.
bzimport added a subscriber: Unknown Object (MLST).

That'd be a matter for your own robots.txt.

I know that I can add these to my robots.txt, and have in fact already done so. What prompted me to write this bug was that I noticed this issue and looked through de.wikipedia.org/robots.txt to see if these pages were already excluded there. As far as I can see, they aren't, so spiders will needlessly download these files from Wikipedia, causing unnecessary server load (the spiders download them just to throw them away immediately afterwards).

So this bug was a request specifically for Wikipedia, not for the MediaWiki software.
Of course, I may have missed something essential, in which case I'm sorry and you can re-close the bug.

jeluf wrote:

Adding hundreds of localized robots.txt entries would increase the size of the robots.txt to an enormous size, eating up the savings.

Additionally, most spiders have faster ways to index Wikipedia than accessing our special pages.