Page MenuHomePhabricator

Allow configuring the current automatic "noindex, nofollow" for all special pages
Open, LowestPublic

Description

Special:Allpages would be a great page to let search engines crawl,
for smaller sites.

Allow me to make the case that one should be able to make
Special:Allpages spiderable. Currently it is _hardwired_
noindex,nofollow, just like the other Special pages,
$wgNamespaceRobotPolicies won't help as it is hardwired in
SpecialSpecialpages.php and even if $wgNamespaceRobotPolicies could be
used, one would like to limit the granularity to just Special:Allpages
and keep the rest of Special: set to noindex,nofollow.

Consider http://radioscanningtw.jidanni.org/
On the Main page the first link I make is to
http://radioscanningtw.jidanni.org/index.php?title=Special:Allpages
expecting users and search engines alike to use it.

Sure, other wikis might have a vibrant tree of information. However
http://radioscanningtw.jidanni.org/ is more of a flat list, with many
categories that don't need pages just to say they represent e.g.,
486.3785 MHz. I like my structure, and users can see all the content,
but search engines can't! Anyways,
http://radioscanningtw.jidanni.org/index.php?title=Special:Allpages
would have been the perfect way to get it indexed, were it not for
some assumption that all Special pages should be noindex,nofollow. No
I do not wish to maintain my own private version of
SpecialAllpages.php, I'm just giving an observation.


Version: 1.11.x
Severity: enhancement

Details

Reference
bz8473

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:32 PM
bzimport set Reference to bz8473.
bzimport added a subscriber: Unknown Object (MLST).
Jidanni created this task.Jan 3 2007, 8:28 PM
brion added a comment.Jan 4 2007, 7:42 AM

Dynamic special pages are in general pretty crappy for spidering and will remain
generally disabled.

Consider using sitemap generation.

ayg wrote:

Why? noindex,follow for Allpages strikes me as sensible, even if not as useful
as a site map.

One could now set the new
$wgArticleRobotPolicies=array('Special:Allpages'=>'noindex,follow');
but apparently Special pages are too hardwired for the weak $wgArticleRobotPolicies to overpower them!
See also Bug 9145.

(I am removing the above mentioned Template_talk:Robots_temp. It contained

==[[Special:Allpages/]]==
{{Special:Allpages/}}
==[[Special:Allpages/Project:]]==
{{Special:Allpages/Project:}}

)

http://perishablepress.com/press/2008/06/03/taking-advantage-of-the-x-robots-tag/ mentions methods perhaps useful to people seeking workarounds for this bug.

jeckyhl wrote:

Quick and dirty (?) solution :

in SpecialPage.php, method setHeaders(), replace

$out->setRobotPolicy( "noindex,nofollow" );

with

global $wgNamespaceRobotPolicies;
$ns = $this->getTitle()->getNamespace();
if ( isset( $wgNamespaceRobotPolicies[$ns] ) ) {
   $policy = $wgNamespaceRobotPolicies[$ns];
} else {
   $policy ='noindex,nofollow';
}
$out->setRobotpolicy( $policy );

This keeps the 'noindex,nofollow' setting as default, but it can be overriden in LocalSettings.php, e.g.

$wgNamespaceRobotPolicies[NS_SPECIAL] = 'noindex,follow'

Likely a WONTFIX as per comment 1. Lowering priority to reflect reality...

Nemo_bis renamed this task from $wgArticleRobotPolicies vs. SpecialPages hardwiring to Allow configuring the current automatic "noindex, nofollow" for all special pages.Jan 16 2015, 10:08 PM
Nemo_bis set Security to None.