Aug 4 2015
We should either have Google staff in the loop
Jul 20 2015
I see x-default header on https://zh.wikipedia.org/wiki/%E7%A5%9E%E5%A5%87%E8%83%B8%E7%BD%A9
@Smalyshev, that's interesting. Yep it looks like zh and srgenerate hreflang alternates for different scripts I think with this code. AFAIK that implementation is pretty good and if they do some auto-redirecting then x-default could belong. See https://phabricator.wikimedia.org/T54429.
@Smalyshev, from a few spot checks it looks like the bad x-default line isn't being generated on article pages anymore but is still showing up in other namespaces e.g. on https://en.wikipedia.org/wiki/Wikipedia:Contact_us or https://en.wikipedia.org/wiki/Portal:Contents or https://en.wikipedia.org/wiki/Help:Contents.
@dr0ptp4kt, definitely go ahead and reach out to Google. Side note: Best to avoid delaying anything we think is right answer until we hear back from Google. They can be.... slow at responding. :)
If we go with @Smalyshev 's extension approach, let's be sure to also remove the code that generates the incorrect x-default tag on pages. IIRC Google just ignores hreflang tags if it gets conflicting or incorrect ones on a page so if we leave the wrong one in it would nullify our fix..
I looked at using sitemaps. It won't be easy. The way Google sets up sitemaps for hreflang requires a complete redundant list of alternates for each page. This leads to large sitemaps.generates a first pass of a for a single topic -- Pluto -- which has something like 167 different alternates on different language wikipedias. The sitemap has to list all 167 alternates for each one of those 167 alternates, which leads to a sitemap with 167*167 or 27,889 links. Since Google limits sitemap files to 50,000 links each, our millions of article topics would likely required hundreds of thousands of unique regularly updated sitemap files.
Jul 15 2015
Heard some feedback from the team at Google:
Jul 14 2015
@dr0ptp4kt I still don't have access AFAIK to any of the https site variants. haven't been able to do any analysis on a bunch of the bug fixes we've deployed and are working on. would appreciate it ASAP. cc @Wwes
Jun 28 2015
Thanks, @Wwes. I don't currently have access to any of the https variants of wikipedia.org. Since we've now switched over to https in our canonical URLs, that's where Google is sending all the traffic now so the http variants should be mostly free of data.
@Stu, could you possibly ping John Mueller to ask if targeting Googlebot in this fashion (i.e., by sending it Link: headers that we don't send to other clients) is fair game?
Just checked my various examples and it's looking great. Nice work!
Jun 23 2015
@Gryllida -- Because crawlers crawl. Old versions are linked to from the "View history" tab so Google and others will find them there and then go crawl them. Did I understand your question?
Jun 22 2015
See comments upthread and in initial discussion that this whole ticket began with a comment from John Mueller at Google's Webmaster Tools team about the addition of <hreflang> block to link different language articles on a topic: "That sounds like a great use of hreflang!" So we can be pretty confident this is consistent with best practice.
May 12 2015
@damons can you help unstick this blocker for Wes and me?
Bump on this. I'm on a hangout with @Wwes right now trying to show him some stuff but don't have access. :-(
May 6 2015
Apr 4 2015
Fair point. <hreflang> tags for the Barack Obama page with its 200+ language alternates works out to a few KB once gzipped. The page does have 1+ MB of HTML though so not sure how meaningful it is.
Apr 3 2015
I was watching a television show here in the U.S. last night called CSI:Cyber. It's not a great show but they did include this screen shot in one of their scenes:
Mar 24 2015
One more question. In production, are the links in getLanguageLinks() already loaded in memory by this time say because they're needed for the interlanguage links in the sidebar? Just want to be sure we're not doubling the load on the wikidata API (which I think is source of these). :)
OK this looks good a couple more things from my comment at https://phabricator.wikimedia.org/T93213#1139490:
- we need to remove the current incorrect use of an x-default hreflang tag. The end goal is that the exact same hreflang tags are atop every language variant of an article so there can't be a sense of "default" we just need to list all the language variants.
- we've got duplicate code now because there was some code a little further up in includes/OutputPage.php that appeared to be an unsuccessful attempt to generate hreflang tags.
Mar 23 2015
To give a sense of scale I checked the Google Webmaster Tools reports for two different local language Wikipedias which show this count of hreflang errors (links go to WMT reports):
Mar 22 2015
Taking another look at /includes/OutputPage.php, it looks like building of variants is already there around line 3415:
Mar 19 2015
Here's the diff of my local proof of concept: