Page MenuHomePhabricator

Google doesn't honor canonical URLs of
Closed, ResolvedPublic


Currently, Google is unexpectedly indexing /zh/ language variant URLs instead of /wiki/ links for Chinese Wikipedia.
A quick example is:汉语+wikipedia
As you can see, the first link's URL is汉语 and lots of other links with /zh/ URLs.
If you open it and check its source, it says

<link rel="canonical" href="" />

So since the "canonical" version is /wiki/ links, Google should follow and index it instead. But at the moment it's not for some reasons.

The most weird part is, if you search with "":汉语
Then suddenly almost all the links in the first page became /wiki/ URLs (correct behavior).

The problem of /zh/ links is that they ignore user's language variant settings. I need to manually change to /zh-cn/ or /zh-tw/ variants after clicking a link from Google (which is a very common scenario). For /wiki/ links, they would automatically jump to the variants according to user's' preference.

It has been like this for months if not years. I have no idea if it's on Google or Wikipedia. I asked several times on but none takes responsibility or has the ability to fix it.

Event Timeline

fireattack raised the priority of this task from to Needs Triage.
fireattack updated the task description. (Show Details)
fireattack added projects: I18n, SEO.
fireattack added a subscriber: fireattack.
fireattack renamed this task from Google doesn't honer canonical URLs on to Google doesn't honor canonical URLs of 8 2015, 6:35 AM
fireattack updated the task description. (Show Details)
fireattack set Security to None.

@Aklapper: Despite that ticket is highly related to this one (I actually mentioned this bug somehow there years ago), I think they're not the same problem as all. That's why I deliberately opened a brand new ticket to bring the attention to developers actually.

To me, that ticket makes NO SENSE, should be closed and keep this one instead. Here is why:

  1. In that ticket, the author mentioned every language variant pages (/zh-tw/ ones, for example) include a canonical rel pioing to /wiki/ links. That's true, but it should be the expected behavior actually.
  2. The author said "This rel=”canonical” link asks search engines to index the Simplified Chinese page" which is completely wrong, at least in today (i don't know if there is any difference then). As I mentioned in this ticket, /wiki/ links are language neutral, not Simplified Chinese variant. It will jumped to user preferred variants according to Wikipedia settings or browsers settings (for guests) automatically.
  3. The author there argued we should let Google index both Simplified Chinese version and Traditional Chinese version. I totally don't agree with that. IMO, we should only let Google index the language neutral version, just like what we're doing now.
  4. Luckily, it seems none agreed with the author so we don't have any progress around that ticket.

In a word, that ticket made a suggestion around our current "canonical" links behavior which in my opinion is based on unfounded evidence (see #2) and should not be followed.

But this ticket is about a bug, in Wikipedia or Google. It's about our intention of canonical links doesn't get honored by Google somehow.

About T33838, I believe it's either the user set something wrong, or it's a different bug existed at that time and got fixed later. Because now if you visit a /zh-tw/ links it definitely will show Traditional Chinese, not Simplified Chinese.

Also I want to bring @liangent to this discussion :)

This problem is not solved in MW 1.28.0. For what I know this is bug related to sitemap, which was not solved for at lest three years. Sitemap list all language varietals with same priority, which cause Google index them randomly.


By hack Mediawiki's core code to remove all Chinese varietals, provide sitemap with only the canonical link significantly improve the correct rate for Chinese Moegirlpedia.

I have reported this bug very early bug the sitemap bug were not fixed till today.

According to Google Support, you must use "alternate" XHTML tags for language variants and sitemap.

Multi Language related sitemap rules:

Sitemap common rules:
"Don't include non-canonical pages in a sitemap. If using a sitemap, specify only canonical URLs in the sitemap."

My test results are:

  1. If you include all language variant URLs in the sitemap, they will be considered as different pages and all of them appears in the search result.
  2. If you only include canonical URL in the sitemap, Google will mix the results (some language variants got excluded, some indexed).
  3. If you use "<xhtml:link rel="alternate" hreflang="">" to tag all variants in the sitemap, Google will choose the correct URL based on user browser language and IP.

Change 609513 had a related patch set uploaded (by VulpesVulpes825; owner: VulpesVulpes825):
[mediawiki/core@master] Write language varaint link as child element rather than individual entry in sitemap

VulpesVulpes825 added a subscriber: VulpesVulpes825.

As T198965#4438038 suggests, fixing Sitemap will not solve this issue unless T87140 gets implemented. Hence removing myself as the assignee of this task.

Which comment there is #4438037? The anchor doesn't work.

@fireattack: Sorry, it should be T198965#4438038. I think the main reason Google doesn't honor canonical URLs of projects that uses language converter is because the alternative link is source file does not follow standard.

<link rel="alternate" hreflang="lang_code" href="url_of_page" />

Alternate URLs must be fully-qualified, including the transport method (http/https), so:, not // or /foo, which mediawiki do no produce.

In theory, Google will display language variant result based on your language setting. E.g. provide zh-cn link it you are searching in zh-Hans-CN, and will only give canonical link if it cannot determine what language you are using.

Even if Google honors the canonical urls, I don't think the search results will show what we want.

We shouldn't be setting canonical url to /wiki on chinese variants pages. The localized pages have their own title. Setting the canonical url to /wiki makes them all show the same title in google search results.

I would suggest we remove the canonical url.

I'm happy to help making the code changes if we agree on doing this.

Cross-posted in

I don't think the search results will show what we want.

Why wouldn't it? I can just speak for myself obviously, but I just want Google to link me to /wiki/ page, like any other Wikipedias.

With /wiki/ link, my Wikipedia account preference (or browser setting, if not logged in) would take care of choosing zh-cn, zh-tw, etc. for me.

Currently (well, at least at the time I made the ticket), it would link me to zh, zh-hk or zh-tw variants all the time, and I have to manually switch variant to my desire (zh-cn).

If we set canonical url to /wiki, google search will think all chinese variants pages are the same, and there will always be only one search result. zh, zh-hk, zh-tw, zh-cn users will all see the same result title and description.

If we remove the canonical url and we set the hreflang correctly, based on your google search location (and preference, brower setting, ip location, etc) google will give you the variant result you want. You will see zh-cn result and link to zh-cn page.

Currently (well, at least at the time I made the ticket), it would link me to zh, zh-hk or zh-tw variants all the time, and I have to manually switch variant to my desire (zh-cn).

I believe this is due to some issue we have with hreflang settings

Ah that sounds good. Thanks.

Unfortunately I use Google in English (instead of zh-cn or zh), but that's my own problem.

Change 879579 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/core@master] Make meta canonical URL variant-language-aware

Change 879579 merged by jenkins-bot:

[mediawiki/core@master] OutputPage: Fix the behavior for canonical URL and alternate URLs

Jdlrobson claimed this task.