URLs with action=render should not be indexed by search engines
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Mattflaschen-WMF
	Apr 14 2014, 4:42 AM

Description

URLs with action=render are indexed by external search engines like Google. For example, see https://www.google.com/search?q=site:en.wikipedia.org+inurl:action%3Drender .

I'm not sure the best approach to fix this. The pages are not well-formed (there's no <head> at all), so I don't know whether a <meta> robots directive (http://www.robotstxt.org/meta.html) will work. It might be necessary to use robots.txt .

Version: 1.23.0
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=46424

Details

Reference: bz63891

Revisions and Commits

rMW MediaWiki
	rMW5f2cb872b587 ImagePage: Inherit parent's handling for action=render

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:21 AM

• bzimport added projects: MediaWiki-General, good first task.

• bzimport set Reference to bz63891.

• Mattflaschen-WMF created this task.Apr 14 2014, 4:42 AM

However that would be undesired for more important reasons since action=render is used to retrieve partial documents. If that would start including non-content, the result is that some applications will treat that <meta> tag as part of the content and thus could incorrectly treat articles as non-indexable.

This sounds like a perfect case for an http header.

(In reply to Krinkle from comment #1)

This sounds like a perfect case for an http header.

Specifically,

X-Robots-Tag: noindex

This is also used on the web as the way to exclude internal APIs that don't respond with html (e.g. JSON responses, or images) when robots.txt hacking is not desired.

Good find. X-Robots-Tag looks like the way to go.

It's supported by Google (https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1), Bing (http://www.bing.com/webmaster/help/how-can-i-remove-a-url-or-page-from-the-bing-index-37c07477), and I'm sure others.

Change 134996 had a related patch set uploaded by devunt:
Add 'X-Robots-Tag: noindex' header in action=render pages

https://gerrit.wikimedia.org/r/134996

Change 134996 merged by jenkins-bot:
Add 'X-Robots-Tag: noindex' header in action=render pages

https://gerrit.wikimedia.org/r/134996

merged by Mattflaschen

matmarex added a commit: rMW5f2cb872b587: ImagePage: Inherit parent's handling for action=render.Oct 18 2018, 3:38 PM

URLs with action=render should not be indexed by search enginesClosed, ResolvedPublicActions

Description

Details

Revisions and Commits

Event Timeline

URLs with action=render should not be indexed by search engines
Closed, ResolvedPublic
Actions