Page MenuHomePhabricator

Pages with __NOINDEX__ are still indexed with Internet search engines
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
I experience on the Danish Wikipedia that the deletion discussion page where a NOINDEX was added some minutes after the creation and noindex is among the tags is still indexed by an Internet search engine (Google). The original deleted page are also indexed even though there is a "noindex" in the robots meta tag (Google, Bing, Qwent).

This may be an issue if the page is about a living person.

What should have happened instead?:
The pages should not be indexed.

Other information (browser name/version, screenshots, etc.):
It is unclear if Wikimedia can do anything about it as it seems to be an issue at the search engine. For Google I have attempted to let the Google bot reread the page, but the pages with noindex are still shown in the Internet search engine results.

Event Timeline

Based on the title, this may be T273745. Google has opinions on implementing a restriction as NOINDEX and robots.txt: https://developers.google.com/search/docs/crawling-indexing/block-indexing

Important: For the noindex rule to be effective, the page or resource must not be blocked by a robots.txt file, and it has to be otherwise accessible to the crawler. If the page is blocked by a robots.txt file or the crawler can't access the page, the crawler will never see the noindex rule, and the page can still appear in search results, for example if other pages link to it.

I'm pretty sure I've seen another task which may be closed now about this issue, and there has been general public annoyance with it on the web. Wikis are in a weird position on the matter.