Page MenuHomePhabricator

Canonical URL should include language variant
Open, HighPublic

Description

Language variants currently point to the same canonical URL. For example, on this page:

http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD

...there is a rel=”canonical” pointing to
http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD

This rel=”canonical” link asks search engines to index the Simplified Chinese page to represent the content on both pages, instead of separately indexing the Simplified Chinese and Traditional Chinese pages. Similar rel=”canonical” links are found on all zh-TW pages. Google is reporting that we see a similar problem on other Chinese (e.g. zh-SG) and Serbian content pages.

(this may be caused by the fix to bug 48402)


Version: 1.23.0
Severity: normal
See Also:
T53753: Links with language variants can't automatically jump to mobile sites
T71026: Canonical links may be in fallback encoding
T108443: Google doesn't honor canonical URLs of zh.wiki

Details

Reference
bz52429

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:45 AM
bzimport set Reference to bz52429.

If I understand the semantic meaning of rel="canonical" correctly, what it does now is the expected behavior.

http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD is not "the Simplified Chinese
page", but an automatically converted page based on requests (prefs for users and Accept-Language for anons). We want all these links to show up in Google search results instead of links specifying a particular variant.

However Google seems not respecting it and indexing links to pages in every variant, and we have to workaround it: https://zh.wikipedia.org/w/index.php?title=MediaWiki:Gadget-variant-link-fix.js

Currently Google seems to mainly index /zh/ links instead of /wiki/'s (which is unexpected). /zh-tw/ or /zh-cn/'s are not indexed as expected though.

RobLa: So should this still be high priority wrt Liangent's comment 1 here?

If still high priority:
Tim: Do you plan to work on this at some point?

(In reply to Andre Klapper from comment #3)

RobLa: So should this still be high priority wrt Liangent's comment 1 here?
If still high priority:
Tim: Do you plan to work on this at some point?

I guess Tim is just the default CC, but actually this issue seems not Wikimedia-specific.

Change 154240 had a related patch set uploaded by Tim Starling:
Don't send rel=canonical to variant-neutral page

https://gerrit.wikimedia.org/r/154240

Change 154240 merged by jenkins-bot:
Don't send rel=canonical to variant-neutral page

https://gerrit.wikimedia.org/r/154240

All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?

(In reply to Rob Lanphier from comment #0)

Language variants currently point to the same canonical URL. For example, on
this page:
http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD
...there is a rel=”canonical” pointing to
http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD

Now has:

<link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="alternate" hreflang="zh-TW" href="/zh-tw/%E6%B1%89%E8%AF%AD" />
<link rel="alternate" hreflang="x-default" href="/wiki/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="canonical" href="http://zh.wikipedia.org/zh-tw/%E6%B1%89%E8%AF%AD" />

But I'm not sure this is properly fixed in general, because this is still an issue:

(In reply to fireattack from comment #2)

Currently Google seems to mainly index /zh/ links instead of /wiki/'s (which
is unexpected). /zh-tw/ or /zh-cn/'s are not indexed as expected though.

The two URLs for "zh" version don't agree on which is canonical:

/zh/ says

<link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="canonical" href="http://zh.wikipedia.org/zh/%E6%B1%89%E8%AF%AD" />

/wiki/ says

<link rel="alternate" hreflang="zh" href="/zh/%E6%B1%89%E8%AF%AD" />
[...]
<link rel="canonical" href="http://zh.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD" />

Created attachment 16795
Google search in Italian for [[zh:汉语]]

If I search a Latin alphabet string of that article I manage to get 4 variants from Google after asking to show me duplicate pages as well. None of them is /wiki/

Searching '"漢語,又称中文、华语" site:wikipedia.org' yielded two results including zh.wap.wikipedia.org/zh-tw/汉语 but that's another bug.

Attached:

Nemo_bis reopened this task as Open.Jun 14 2015, 12:22 AM

Not a duplicate.

Krinkle renamed this task from Language variants currently point to the same canonical URL to Canonical URL should include language variant.Jul 7 2015, 10:34 AM
Krinkle set Security to None.

I don't know why the title of this page reads "Canonical URL should include language variant" (because it shouldn't). But anyway, I'm here to report the exact weired behavior mentioned above:

Created attachment 16795
Google search in Italian for [[zh:汉语]]
If I search a Latin alphabet string of that article I manage to get 4 variants from Google after asking to show me duplicate pages as well. None of them is /wiki/
Searching '"漢語,又称中文、华语" site:wikipedia.org' yielded two results including zh.wap.wikipedia.org/zh-tw/汉语 but that's another bug.
Attached:

If you use google to search "汉语 维基百科", the first result would be https://zh.wikipedia.org/zh/汉语 and all the other results below are using /zh/.

This is NOT optimal because it will show page in original variant instead of user's preference (in my case, zh-cn).

However if you search with "汉语 维基百科 site:wikipedia.org", the first result will become https://zh.wikipedia.org/wiki/汉语 as well as other links. This is optimal because /wiki/ links would automatically jump to language variant that user wants.

I have no idea what causes this strange situation (it's even maybe Google's fault), but it needs to be fixed. It's quite annoying that users need to manually change language variant from Google result.

So I think this bug should be "Canonical URL should be /wiki/ links, but somehow Google doesn't honor it".

Restricted Application added a subscriber: Cosine02. · View Herald TranscriptSep 28 2016, 3:33 AM
Amire80 moved this task from Untriaged to Script conversion on the I18n board.Feb 4 2018, 10:47 AM
Nemo_bis updated the task description. (Show Details)Sep 11 2018, 10:22 PM
Restricted Application added a subscriber: Petar.petkovic. · View Herald TranscriptSep 11 2018, 10:22 PM