Page MenuHomePhabricator

Talk pages of 'noindex' articles should also be 'noindex'
Open, Needs TriagePublicFeature

Description

WP Talk pages for articles that have the noindex metatag should also be set to noindex.

Currently, mainspace Wikipedia articles that are set to noindex for whatever reason (e.g., unpatrolled) are apparently accompanied by Talk pages that are not so tagged. This may result in a search engine query for the title returning a Talk page, while failing to return the mainspace page.

This occurred and resulted in a Tea house question, in which a google search returned the Talk page (at result #2) in response to a search, but failed to return the mainspace page of the corresponding article. This is now being discussed at Village pump technical.

To reproduce:

Benefits - we should not encourage Google to surface Talk pages before (or instead of) mainspace articles by use of noindex metatag only on the article; this will only confuse web searchers looking for information about a topic to land at a Talk page instead.

Event Timeline

No they should not, they are indexed on purpose. noindex makes any info in talk pages undiscoverable which is hugely annoying. Its bad enough that we are hiding userpages. Why are we continiously encroaching and breaking google index'ing further and further ?

TheDJ, are you okay with the Talk page being indexed, when the article page is not indexed? Don't you think that hurts searcher/readers who are looking to be informed about a given topic, only to find a WP Talk page, and no article? That's the only point of this ticket; it is not to prevent Talk pages from being indexed, but only to prevent a Talk page from being the *only* page that is indexed.

TheDJ, are you okay with the Talk page being indexed, when the article page is not indexed?
only to prevent a Talk page from being the *only* page that is indexed.

I don't think that is technically possible. It would make every talk page depend on the contents of every article page and vice versa. That means both have to have change detection as a single unit.

It's happening already, which is why I created this ticket in the first place, to avoid that. But I think that a Phab ticket is not the right venue to have a debate about pros & cons of a topic, so can you contribute your thoughts at the VPT thread? Thanks.

It's happening already, which is why I created this ticket in the first place, to avoid that. But I think that a Phab ticket is not the right venue to have a debate about pros & cons of a topic, so can you contribute your thoughts at the VPT thread? Thanks.

Sorry, i quoted that wrong: i meant to comment on the "only to prevent a Talk page from being the *only* page that is indexed."-bit

But that is currently happening. See VPT thread.

I don't think this is a good idea, and why would something like this apply to "articles" only? Are you asking that something determine if the first noindexed space is a "content" space - or is this something you only want for projects that use "articles"?

Not sure how to properly link related tickets, but Izno mentioned T53736 ("Consider changing wikipage redirects to be proper HTTP redirects"), in the context of the discussion at WP:VPT (at @21:05, 7 May).

Not sure how to properly link related tickets, but Izno mentioned T53736 ("Consider changing wikipage redirects to be proper HTTP redirects"), in the context of the discussion at WP:VPT (at @21:05, 7 May).

(Adding the task number will add a "mention" below the task description on both tasks and provide a link in the 'threaded discussion' in the other task. Usually that's enough. For a parent/child relation you go to "Edit Related Tasks".)

Oh, nice feature! Thanks for the tips.

I don't think this is a good idea, and why would something like this apply to "articles" only? Are you asking that something determine if the first noindexed space is a "content" space - or is this something you only want for projects that use "articles"?

I think I may have made a tactical mistake in the choice of the ticket title, which seems to be crafting a solution instead of specifying what the problem is. This jumps the gun, since it then encourages responses to that, instead of what the underlying problem is. So, maybe I should start over.

The problem is, that using a search engine to search for an unpatrolled article in Wikipedia may turn up the Talk page of the article, but not the article page itself. (Even searching for the article page by full url fails, indicating that the article must not be in Google's index.) Leading a naive user to a Talk page is (imho) worse than not returning a page at all, and would likely be very confusing, and is not what Wikipedia should be encouraging.

Google is going to do whatever they're going to do in response to a query, but to the extent that we can do anything to reduce the chances that a Talk page is returned by their algorithm instead of an article page, we should do it.

What I probably should have called it, is something like, "discourage search engines from returning Talk pages only". And so, I don't really have a response to your question about applying to articles only, because that's back to the design discussion which I may have (erroneously) triggered.

I'm just noticing search engine behavior that seems a disservice to our readers, and to the extent we can influence that, I think we should. And now I'll shut up about 'noindex' tags or any other possible solution, and just leave it in your hands and let you guys figure out whether there is some way to improve our reader experience with this or not, and what the best way to do that might be. And I apologize for leading the whole discussion astray with that title.