Page MenuHomePhabricator

robots are instructed to index redirect pages
Closed, DeclinedPublic

Description

Author: warrenstuart

Description:
When I use the search engine (mnoGoSearch) to index my site I get lots of
duplicate pages caused by redirects. What I propose is that redirect pages have
the meta tag for robots changed to noindex. This might also help with search
engines indexing wikipedia and make them get a better index of the site.

current on all pages
<meta name="robots" content="index,follow" />
proposed for redirect pages
<meta name="robots" content="noindex,follow" />


Version: unspecified
Severity: enhancement

Details

Reference
bz2798

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:41 PM
bzimport set Reference to bz2798.
bzimport added a subscriber: Unknown Object (MLST).

warrenstuart wrote:

an quick example of this is:

http://en.wikipedia.org/wiki/Firefox

Which gives

Mozilla Firefox
From Wikipedia, the free encyclopedia.
(Redirected from Firefox)

but the robots meta is
<meta name="robots" content="index,follow" />

Changing summary to reflect the problem and upping severity to normal

avarab wrote:

Wouldn't this hurt rankings for alternate spellings of the title?

bugzilla.wikipedia.org wrote:

In order to allow alternate spellings to be indexed by search engines, the name
of redirect articles could be mentioned in a meta tag of the target article,
i.e. <meta name="keywords" content="...">.

I don't know whether meta tags are honoured by search engines anymore, though.

en.ABCD wrote:

I think that maybe, if the User-Agent is that of a major search engine (e.g.
Google), then when requesting a redirect page, instead of the standard redirect,
the bot should recieve an HTTP 301, so that it will consider it to be the same
page as the target - reducing dups in the search results.

gangleri wrote:

(In reply to comment #4)

In order to allow alternate spellings to be indexed by search engines, the name
of redirect articles could be mentioned in a meta tag of the target article,
i.e. <meta name="keywords" content="...">.

I don't know whether meta tags are honoured by search engines anymore, though.

this is the requirement of
bug 846: feature request: control of meta name="KEYWORDS" content="..."

rowan.collins wrote:

(In reply to comment #5)

I think that maybe, if the User-Agent is that of a major search engine (e.g.
Google), then when requesting a redirect page, instead of the standard redirect,
the bot should recieve an HTTP 301

I may be wrong about this, but I remember hearing that Google make spot checks
of some kind with a user-agent of something like IE, to penalise sites that send
completely different "optimised" content to the main crawler.

Restored bug from flood attack.

Closing this WONTFIX; this behavior is deliberate, so titles can be searched on.