API landing page (action=help) should not be indexed by robots
Open, LowPublic

Description

api.php endoint, when displaying the autogenerated help page (when called without parameters or with action=help), should output meta tags to prevent it from being indexed by search engines, since there's nothing useful to index there. Those links even have example URLs with usage. This adds unnecessary load to the server when search spiders kick in, and may cause unwanted results to be indexed.

The problem was brought on Support Desk

And anyone could see that it's being indexed with a simple query: https://www.google.com/search?q="MediaWiki+API+help"

Ciencia_Al_Poder updated the task description. (Show Details)
Ciencia_Al_Poder raised the priority of this task from to Needs Triage.
Ciencia_Al_Poder added a subscriber: Ciencia_Al_Poder.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 25 2015, 8:33 PM
Anomie added a subscriber: Anomie.EditedFeb 26 2015, 3:23 PM

Usually the entire API endpoint is added to robots.txt. If the site is using a $wgArticlePath that isn't under $wgScriptPath (like WMF sites), the entire $wgScriptPath should probably be added to robots.txt.

But why limit it to the help? Shouldn't all API requests use the X-Robots-Tag header, and the 'fm' output also include the meta tag? Although in that case we'd need to support silliness like this.

Aklapper triaged this task as Low priority.Feb 27 2015, 11:00 AM