Page MenuHomePhabricator

Avoid use of the mixed ("main language code") variant, improve selector visibility
Open, Needs TriagePublic

Description

According to a report from @dchan at the 2016 All Hands, zh.wikipedia.org readers in Hong Kong commonly see articles in the main language code, i.e. zh, rather than in a specific variant. This means that the wiki will be a mix of Simplified and Traditional scripts, sometimes changing from one to the other in the course of an article. This makes the content very hard to read.

In practice, most readers did not discover the variant drop down, and instead carried on reading, with difficulty.

This is presumably because they use a browser which does not give any valid variant in the Accept-Language header. LanguageConverter::getPreferredVariant() will then fall back to $wgDefaultLanguageVariant, which is false on all WMF wikis, and then to the main language code.

I suggest adding geolocation, and then finally picking a winner arbitrarily (say, Simplified). Content should not be served from /wiki/, redirection should always be done to a specific variant URL.

Also, we should improve the visibility of the variant selector, say with a GuidedTour-style callout on first view.

Event Timeline

tstarling raised the priority of this task from to Needs Triage.
tstarling updated the task description. (Show Details)
tstarling added subscribers: tstarling, liangent, dchan.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 18 2016, 4:40 AM
liangent added a comment.EditedJan 18 2016, 11:10 PM

If it's only Hong Kong people who report this, I suspect they're actually using some English operating systems / browsers, considering English is another official language there (is there any Cantonese operating system?)... Obviously they will see mixed content if they don't configure their browsers which send Accept-Language: en by default.

Geolocation should be used after Accept-Language, because users can configure Accept-Language "easily" but not their IP addresses. Plus, many people from the mainland China tend to use some proxies which break geolocationing.

I remember content served from /wiki/ can be language-converted; was it changed (again) recently? This would also benefit link sharing, so recipients see pages in their preferred variants, instead of sender's preferred one.

+1 to increasing the selector visibility
+1 to improving conversion intelligence for content served from /wiki/ , to support link sharing

Trusting Accept-Language has certain problems for zh-HK (and presumably also zh-SG, zh-MY and zh-MO):

  • Many users/devices have only en/zh-TW set.
  • Some platforms expose no way to set zh-HK (e.g. Chrome desktop and some versions of Android)
  • Other platforms or expose no way to set zh-TW as a fallback locale (e.g. Chrome on Android)
  • Some (non-Wikimedia) websites/apps incorrectly serve Simplified Chinese or even English for zh-HK (but not zh-TW)

So it's not a straightforward matter to push users to prepend zh-HK to Accept-Language.

Theoretically we could prioritise Accept-Language if it specifies a different script to the geolocation (e.g. a Simplified Chinese request from Hong Kong), but otherwise prioritise the geolocation. In pseudo-code:

$accept_script = guess script from accept-language ('Hans' or 'Hant' or null)
$geo_script = guess script from geolocation ('Hans' or 'Hant' or null)
if $accept_script != null and $accept_script != $geo_script:
    guess variant from accept-language
else:
    guess variant from geolocation

If neither Accept-Language nor the geolocation give any clue as to variant, a highly visible callout nudging the user to select a variant would seem sensible. (Anecdotally I've seen people in Hong Kong struggling through the mixed text that can appear right now - this must happen to overseas Chinese readers too).

tstarling updated the task description. (Show Details)Jan 27 2016, 9:33 PM
tstarling set Security to None.
Liuxinyu970226 added a subscriber: Cosine02.

This should be the actual tag adding example