Page MenuHomePhabricator

If a global user page is marked with __NOINDEX__ the same should be applied to the local copies
Closed, ResolvedPublic

Assigned To
Authored By
He7d3r
Feb 23 2015, 7:18 PM
Referenced Files
None
Tokens
"Like" token, awarded by ToBeFree."The World Burns" token, awarded by Stepro."Burninate" token, awarded by Raymond."Burninate" token, awarded by Envlh."Meh!" token, awarded by Incnis_Mrsi."Manufacturing Defect?" token, awarded by YMS."The World Burns" token, awarded by Base."The World Burns" token, awarded by Thgoiter.

Event Timeline

He7d3r raised the priority of this task from to Needs Triage.
He7d3r updated the task description. (Show Details)
He7d3r added a project: GlobalUserPage.
He7d3r subscribed.
Legoktm triaged this task as High priority.
Legoktm set Security to None.
Legoktm subscribed.

Un-cookie licking.

Do I understand correctly that, after fulfilling the task, <includeonly>__NOINDEX__</includeonly> will behave intuitively?
With the present software, it has no effect.

Do I understand correctly that, after fulfilling the task, <includeonly>__NOINDEX__</includeonly> will behave intuitively?
With the present software, it has no effect.

Ideally yes. However given how global user pages currently works - that seems unlikely to happen

However given how global user pages currently works - that seems unlikely to happen

It was reasonable to serve all replicas with <meta name="robots" content="noindex"/> summarily, if a page is not parsed for each site separately.
But cool coders opted to poison indices of Google and other search engines with myriads of out-of-context links.

Making all the "copies" of the user pages as noindex (with no per page control) is certainly a possibility if the user communities in question supported that decision.

We could also look into rel=canonical but im not sure that would work cross domain

Making all the "copies" of the user pages as noindex (with no per page control) is certainly a possibility if the user communities in question supported that decision.

We could also look into rel=canonical but im not sure that would work cross domain

This was implemented in T177159.

I am not sure if I understand this above correctly because I do not understand the whole system.

I've a problem with the missing NOINDEX at my included global user pages:

At the original on Meta: view-source:https://meta.wikimedia.org/wiki/User:Steffen_Proessdorf_(WMDE)
can be read in line 19:
<meta name = "robots" content = "noindex, follow" />

The included page on de-WP (and other projects) simply lacks this line:
view-source:https://de.wikipedia.org/wiki/Benutzer:Steffen_Proessdorf_(WMDE)

This seems to me to be a bug that overrides the NOINDEX on the pages which are only included.

In fact these page is the first google result searching my name, what I really not want.

As far I can see this bug is unsolved for years? So my only option is to completely delete my global user page to get it out of Google?

Has this bug with "High" priority been really opened for more than 5 years and still nothing changed? This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!

@Mathis_Benguigui: Correct. If you (or anyone else) would like to see this fixed, then please feel free to provide a software patch. Thanks a lot!

Has this bug with "High" priority been really opened for more than 5 years and still nothing changed?

The biggest problem is that search engines are all a black box, and every time we deploy a change it takes a month or more to reflect in search engines, and by then we've moved on to other things/forgotten about this.

This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!

Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.

What is the relation to T177159: Avoid indexing of local "copies" of the central user page? It has a patch merged in May which should fix this.

They're asking for different things, but it looks like the solution will be the same for both, given that we're unconditionally applying noindex to all local copies. In theory this task is resolved but I still see local copies in Google...sigh

Has this bug with "High" priority been really opened for more than 5 years and still nothing changed?

The biggest problem is that search engines are all a black box, and every time we deploy a change it takes a month or more to reflect in search engines, and by then we've moved on to other things/forgotten about this.

This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!

Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.

My fr.wikisource user page, included from Meta, was in Google's first page before I requested deletion.

Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.

My fr.wikisource user page, included from Meta, was in Google's first page before I requested deletion.

Well deleting makes it near impossible to debug, but in any case, luckily I was able to find your en.wp user page (on the mobile domain though) still in Google's cache (don't know how long that link will work for). Looking at the HTML source that Google has, it contains no robots tag, but it has <meta name="generator" content="MediaWiki 1.35.0-wmf.31"/> which means that it was indexed about 3 weeks before my change to set noindex (part of wmf.34) went out.

I don't think there's anything left for us to do on the MediaWiki side, but unfortunately, I don't really know how to force Google to reindex all these pages.

As far as I know, it is up to Google to value the request. __NOINDEX__ merely notifies search engines not to index the page; it does not have any power over the bots. Correct me if I'm wrong.

As far as I know, it is up to Google to value the request. __NOINDEX__ merely notifies search engines not to index the page; it does not have any power over the bots. Correct me if I'm wrong.

Basically that's correct. As far as I can tell, Google *does* respect the noindex flag, it just takes a while and I at least have no idea how long or on what schedule it happens.

Legoktm claimed this task.

It's effectively resolved.