Page MenuHomePhabricator

If a global user page is marked with __NOINDEX__ the same should be applied to the local copies
Open, HighPublic

Assigned To
None
Authored By
He7d3r
Feb 23 2015, 7:18 PM
Tokens
"Like" token, awarded by ToBeFree."The World Burns" token, awarded by Stepro."Burninate" token, awarded by Raymond."Burninate" token, awarded by Envlh."Meh!" token, awarded by Incnis_Mrsi."Manufacturing Defect?" token, awarded by YMS."The World Burns" token, awarded by Base."The World Burns" token, awarded by Thgoiter.

Event Timeline

He7d3r created this task.Feb 23 2015, 7:18 PM
He7d3r raised the priority of this task from to Needs Triage.
He7d3r updated the task description. (Show Details)
He7d3r added a project: GlobalUserPage.
He7d3r added a subscriber: He7d3r.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 23 2015, 7:18 PM
Legoktm claimed this task.Feb 23 2015, 7:55 PM
Legoktm triaged this task as High priority.
Legoktm set Security to None.
Base added a subscriber: Base.Feb 24 2015, 1:25 AM
Yihaa added a subscriber: Yihaa.Mar 18 2015, 6:06 PM
YMS added a subscriber: YMS.Mar 24 2015, 4:34 PM
Legoktm removed Legoktm as the assignee of this task.Jun 25 2015, 9:13 PM
Legoktm added a subscriber: Legoktm.

Un-cookie licking.

Base awarded a token.Mar 19 2016, 1:31 PM
YMS awarded a token.Apr 1 2016, 3:57 PM

Do I understand correctly that, after fulfilling the task, <includeonly>__NOINDEX__</includeonly> will behave intuitively?
With the present software, it has no effect.

Do I understand correctly that, after fulfilling the task, <includeonly>__NOINDEX__</includeonly> will behave intuitively?
With the present software, it has no effect.

Ideally yes. However given how global user pages currently works - that seems unlikely to happen

However given how global user pages currently works - that seems unlikely to happen

It was reasonable to serve all replicas with <meta name="robots" content="noindex"/> summarily, if a page is not parsed for each site separately.
But cool coders opted to poison indices of Google and other search engines with myriads of out-of-context links.

Making all the "copies" of the user pages as noindex (with no per page control) is certainly a possibility if the user communities in question supported that decision.

We could also look into rel=canonical but im not sure that would work cross domain

Making all the "copies" of the user pages as noindex (with no per page control) is certainly a possibility if the user communities in question supported that decision.

We could also look into rel=canonical but im not sure that would work cross domain

This was implemented in T177159.

Envlh awarded a token.Oct 31 2018, 8:21 AM
Envlh added a subscriber: Envlh.Oct 31 2018, 8:27 AM
Raymond added a subscriber: Raymond.
Stepro added a subscriber: Stepro.Nov 19 2018, 8:49 AM

I am not sure if I understand this above correctly because I do not understand the whole system.

I've a problem with the missing NOINDEX at my included global user pages:

At the original on Meta: view-source:https://meta.wikimedia.org/wiki/User:Steffen_Proessdorf_(WMDE)
can be read in line 19:
<meta name = "robots" content = "noindex, follow" />

The included page on de-WP (and other projects) simply lacks this line:
view-source:https://de.wikipedia.org/wiki/Benutzer:Steffen_Proessdorf_(WMDE)

This seems to me to be a bug that overrides the NOINDEX on the pages which are only included.

In fact these page is the first google result searching my name, what I really not want.

As far I can see this bug is unsolved for years? So my only option is to completely delete my global user page to get it out of Google?

JFishback_WMF moved this task from Intake to Backlog on the Privacy board.Mar 23 2020, 11:57 PM

Has this bug with "High" priority been really opened for more than 5 years and still nothing changed? This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!

@Mathis_Benguigui: Correct. If you (or anyone else) would like to see this fixed, then please feel free to provide a software patch. Thanks a lot!

What is the relation to T177159: Avoid indexing of local "copies" of the central user page? It has a patch merged in May which should fix this.

Has this bug with "High" priority been really opened for more than 5 years and still nothing changed?

The biggest problem is that search engines are all a black box, and every time we deploy a change it takes a month or more to reflect in search engines, and by then we've moved on to other things/forgotten about this.

This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!

Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.

What is the relation to T177159: Avoid indexing of local "copies" of the central user page? It has a patch merged in May which should fix this.

They're asking for different things, but it looks like the solution will be the same for both, given that we're unconditionally applying noindex to all local copies. In theory this task is resolved but I still see local copies in Google...sigh

Has this bug with "High" priority been really opened for more than 5 years and still nothing changed?

The biggest problem is that search engines are all a black box, and every time we deploy a change it takes a month or more to reflect in search engines, and by then we've moved on to other things/forgotten about this.

This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!

Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.

My fr.wikisource user page, included from Meta, was in Google's first page before I requested deletion.

Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.

My fr.wikisource user page, included from Meta, was in Google's first page before I requested deletion.

Well deleting makes it near impossible to debug, but in any case, luckily I was able to find your en.wp user page (on the mobile domain though) still in Google's cache (don't know how long that link will work for). Looking at the HTML source that Google has, it contains no robots tag, but it has <meta name="generator" content="MediaWiki 1.35.0-wmf.31"/> which means that it was indexed about 3 weeks before my change to set noindex (part of wmf.34) went out.

I don't think there's anything left for us to do on the MediaWiki side, but unfortunately, I don't really know how to force Google to reindex all these pages.