Page MenuHomePhabricator

rest.wikimedia.org is crawled by search engines
Closed, ResolvedPublic

Description

I find pages such as https://rest.wikimedia.org/fr.wikipedia.org/v1/page/html/WebEx in Google search results. (Yes, only if I specify site:wikimedia.org.)

There are at least two severe bugs:

  1. https://rest.wikimedia.org/robots.txt should exclude crawling by search engines;
  2. the canonical URL of each page should be set.

Event Timeline

Nemo_bis raised the priority of this task from to High.
Nemo_bis updated the task description. (Show Details)
Nemo_bis added projects: SEO, RESTBase.
Nemo_bis added a subscriber: Nemo_bis.
GWicke claimed this task.

This is now deployed: http://rest.wikimedia.org/robots.txt

@Nemo_bis, thanks for the report!

I still don't see canonical URLs though, hence these pages fragment our PageRank and contribute to the decline of Wikimedia projects.

@Nemo_bis: The current robots.txt basically hides API content from search engines completely. There is no effect of the API on PageRank at all.