Page MenuHomePhabricator

Avoid indexing of local "copies" of the central user page
Closed, ResolvedPublic

Description

I noticed today that user pages from Meta-Wiki are also indexed from seemingly random wikis, possibly all. e.g. https://sw.wikibooks.org/wiki/Mtumiaji:Krinkle is a result in Google on the second page for my username (not higher than the version from Meta-Wiki, but still).

We should probably mark these as Noindex and/or ensure the Canonical link header is set to the central version?

Event Timeline

Krinkle renamed this task from Avoid indexing local "copies" of the central user page to Avoid indexing of local "copies" of the central user page.Oct 1 2017, 2:24 AM

There is a similar task, which proposes that __NOINDEX__ on the gobal user page should also be applied on local copies: T90475.

Setting the canonical link should be easy enough. Do we also need to mark these as no index in addition to that?

Change 381882 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/extensions/GlobalUserPage@master] Set canonical URL of remote pages to central page

https://gerrit.wikimedia.org/r/381882

Krinkle triaged this task as Medium priority.Oct 3 2017, 6:10 PM

Change 381882 merged by jenkins-bot:
[mediawiki/extensions/GlobalUserPage@master] Set canonical URL of remote pages to central page

https://gerrit.wikimedia.org/r/381882

@Krinkle: My global userpage on Wikimedia Commons still appears in Google results, even with the text "… you see on this page was copied from". Is this an intentional exception by the Commons community?

Hmmm....

Screenshot_2019-07-09  you see on this page was copied from - Google Search.png (942×1 px, 112 KB)

Is this an intentional exception by the Commons community?

Not that I'm aware of...

Same for the dewiki page; the search terms "Benutzer ToBeFree de wikipedia" and "User ToBeFree Wikimedia Commons" are very specific, but to my knowledge, a properly working "NOINDEX" should prevent, or even remove, these results.

reopening, but perhaps it's the wrong phabricator task. T90475 seems to be relevant too. Judging by the title of this task here, however, it isn't yet fixed.

Indeed. The hundreds of local duplicates like https://en.wikipedia.beta.wmflabs.org/wiki/User:Krinkle and https://en.wikipedia.org/wiki/User:Timo_Tijhof_(WMF) still do not have noindex and do still appear in Google search results, in addition to their originals at Meta-Wiki.

Change 597906 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/extensions/GlobalUserPage@master] Set 'noindex' for global user pages to avoid indexing by search engines

https://gerrit.wikimedia.org/r/597906

Change 597906 merged by jenkins-bot:
[mediawiki/extensions/GlobalUserPage@master] Set 'noindex' for global user pages to avoid indexing by search engines

https://gerrit.wikimedia.org/r/597906

https://www.google.com/search?q=%22you+see+on+this+page+was+copied+from%22&hl=en&filter=0&biw=2560&bih=1311 still has some results, but the few I checked out all have <meta name="robots" content="noindex,nofollow"/> set.

Good enough for me. I see the matches for my username have also dropped from 100s to only four. and those results for non-canonical URLs (m-dot with odd query parameter) which will presumably fall out once it gets recrawled.