My user page appears on
but not on
says indexing by robots is disallowed.
My user page appears on
It was reasonable to serve all replicas with <meta name="robots" content="noindex"/> summarily, if a page is not parsed for each site separately.
But cool coders opted to poison indices of Google and other search engines with myriads of out-of-context links.
Making all the "copies" of the user pages as noindex (with no per page control) is certainly a possibility if the user communities in question supported that decision.
We could also look into rel=canonical but im not sure that would work cross domain
I am not sure if I understand this above correctly because I do not understand the whole system.
I've a problem with the missing NOINDEX at my included global user pages:
At the original on Meta: view-source:https://meta.wikimedia.org/wiki/User:Steffen_Proessdorf_(WMDE)
can be read in line 19:
<meta name = "robots" content = "noindex, follow" />
The included page on de-WP (and other projects) simply lacks this line:
This seems to me to be a bug that overrides the NOINDEX on the pages which are only included.
In fact these page is the first google result searching my name, what I really not want.
As far I can see this bug is unsolved for years? So my only option is to completely delete my global user page to get it out of Google?
Has this bug with "High" priority been really opened for more than 5 years and still nothing changed? This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!
The biggest problem is that search engines are all a black box, and every time we deploy a change it takes a month or more to reflect in search engines, and by then we've moved on to other things/forgotten about this.
This a major privacy issue, user pages included from Meta are in Google's first page despite user chose not to index it!
Please provide an example. GlobalUserPage should have no impact on the indexing status of pages from Meta.
They're asking for different things, but it looks like the solution will be the same for both, given that we're unconditionally applying noindex to all local copies. In theory this task is resolved but I still see local copies in Google...sigh
Well deleting makes it near impossible to debug, but in any case, luckily I was able to find your en.wp user page (on the mobile domain though) still in Google's cache (don't know how long that link will work for). Looking at the HTML source that Google has, it contains no robots tag, but it has <meta name="generator" content="MediaWiki 1.35.0-wmf.31"/> which means that it was indexed about 3 weeks before my change to set noindex (part of wmf.34) went out.
I don't think there's anything left for us to do on the MediaWiki side, but unfortunately, I don't really know how to force Google to reindex all these pages.
As far as I know, it is up to Google to value the request. __NOINDEX__ merely notifies search engines not to index the page; it does not have any power over the bots. Correct me if I'm wrong.