Page MenuHomePhabricator

As a Chinese reader, I want better control of which character set I see zh wiki content in
Closed, ResolvedPublic

Description

Proposed design solution

  • Allow all eight Chinese variants to be selected as primary language (via Settings) and to be selected on a per-article basis (from the in-article language picker)
  • Variants supported, in order of how they should appear in all lists:
    • 简体 Chinese, Simplified (zh-hans)
    • 繁體 Chinese, Traditional (zh-hant)
    • 大陆简体 Mainland Simplified (zh-cn)
    • 香港繁體 Hong Kong Traditional (zh-hk)
    • 澳門繁體 Macau Traditional (zh-mo)
    • 大马简体 Malaysia Simplified (zh-my)
    • 新加坡简体 Singapore Simplified (zh-sg)
    • 臺灣正體 Taiwanese Traditional (zh-tw)

Language selection

Background

Readers can have one Primary language and as many secondary languages as they desire. The 'chrome' (eg. non-content) text will be shown in their iPhone system language, even if that language does not match their primary wiki-language. The primary wiki-language is the default language for search, secondary wiki-languages are shown as tabs on the search screen and are also floated to the top of the language selection list in-article. All wiki-languages by default are shown in the content of the Explore feed (eg. cards from each selected language will show as available in the Explore feed) unless explicitly turned off by the reader.

Readers can select languages in 4 places:

  • During onboarding
  • From the settings screen
  • In the article view
  • In the search screen
Language selection during onboarding

During onboarding we will detect the reader's system language (eg what their iPhone is set to) and suggest that language as their Primary language, as well as any other languages they may have on their iPhone as secondary languages. Readers may only have one Primary language, but they are welcome to add as many secondary languages as they desire. We will support a reader's choice to add as many Chinese variants as they would like (even all 6).

Apple currently supports three Chinese variants so we will need to map these variants to our supported variants. Below is a suggested mapping:

iPhone languageDefault Wiki-language
Chinese, SimplifiedChinese, Simplified (zh-hans)
Chinese, TraditionalChinese, Traditional (zh-hant)
Chinese, Traditional (Hong Kong)Hong Kong Traditional (zh-hk)
Language selection from Settings

When a reader taps on the 'gear' icon in the top of their Explore feed they are brought to the Settings page. In the Settings page they can edit their selected languages by tapping on 'My languages'. Here we will give readers the same options that they had in onboarding (eg. to set a primary language and to add/remove secondary languages). Just like during onboarding they can pick as many or as few secondary languages as they desire. In Settings, readers can also turn languages on and off in the Explore feed, or turn specific Explore feed cards on and off.

In-article language selection behavior

In the article view, readers can tap on the language button in the toolbar to change the language of the article. This is not a global setting, but will persist as long as the reader is tapping on blue links from within an article in that language (eg. if my primary language is English but I change the language of an article to French, when I tap on a blue link to the article 'chat' from the current article I am brought to the fr-wiki 'chat' article and not the en-wiki 'cat' article.)

Details of in-article language selection

  • If a reader's primary language is a ZH variant then all articles should be shown in this variant unless they switch their language by using the article toolbar
  • Following blue links should persist the ZH variant of the parent article (eg. if the reader loads an article in zh-cn and then taps on a blue link in the article that article should be loaded in zh-cn even if their primary language is zh-hk)

ZH variant as primary language

ZH Variants - Chinese variant set as default language (1).jpg (1×9 px, 463 KB)

Non-ZH variant as primary language

ZH Variants - Non-Chinese variant as primary language  (1).jpg (1×9 px, 399 KB)

Language selection from Search

On search all of the languages that a reader has selected (primary and secondary) are shown as tabs. Tapping on a tab enables the reader to search in the selected language. If a reader has more than 3 languages (1 primary and 2 secondary) then they will need to tap on the 'more' button to view all of their languages and select their desired language. Tapping on the 'more' button also opens the 'Wikipedia languages panel; from here they can add languages as secondary languages or tap on edit to remove or reorder their languages. Changes made in the 'more' panel in search are global changes and are reflected in settings and throughout the app.


Editing impact

No changes need to be made in the iOS app. ZH wiki utilizes Language converter, which enables contributors to write in their preferred variant, while updating the article across all variants.


Relevant documents

Wikipedia article about Chinese Wikipedia's use of automatic conversion
Wikipedias in multiple writing systems
Automatic conversion between simplified and traditional Chinese
Language converter
Language converter: syntax


Questions

  • If a reader has a ZH variant selected as their primary language (let's say zh-cn) AND a ZH variant selected as a secondary language (let's say zh-hk) then the Explore feed should we show Explore feed cards in both variants by default (as readers can turn languages off in Explore feed settings)?
    • Answer: As each variant will be a first class language we will show all languages by default in the Explore feed and allow readers to turn cards off as they desire using Explore feed settings
  • If a reader has multiple variants selected as secondary languages, should we support showing all of their selected variants in search?
    • Answer: Same as above, as each variant will be a first class language we will show all selected variants in search.

Related Objects

StatusSubtypeAssignedTask
Resolved JMinor
Resolved JMinor
DeclinedNone
ResolvedBUG REPORTLGoto
ResolvedBUG REPORTJoeWalsh
ResolvedBUG REPORT JMinor
Resolvedcooltey
ResolvedBUG REPORTNone
Resolvedcooltey
ResolvedLGoto
Resolved JMinor
ResolvedJTannerWMF
Resolved JMinor
DuplicateNone
Resolvedcmadeo
Declinedcmadeo
Resolved JMinor
OpenBUG REPORTcscott
OpenNone
Resolved JMinor
Resolved JMinor
Resolved JMinor
Resolved JMinor
Resolved JMinor
Resolved JMinor
OpenBUG REPORTNone
Resolved JMinor

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
JMinor raised the priority of this task from Medium to Needs Triage.Jun 18 2018, 6:46 PM
Vvjjkkii renamed this task from As a chinese reader, I want better control of which character set I see zh wiki content in to ajcaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from ajcaaaaaaa to As a chinese reader, I want better control of which character set I see zh wiki content in.Jul 1 2018, 7:16 AM
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
CommunityTechBot raised the priority of this task from High to Needs Triage.Jul 3 2018, 2:07 AM
cmadeo updated the task description. (Show Details)

Question: What do we do if the automatic translation mark-up is not available on an article?

If I've understood the meaning of the diagram above correctly, I don't think the fallback case will ever happen — because LanguageConverter will always return output for whichever zh variant is specified.

If a reader has a ZH variant selected as their primary language (let's say zh-cn) AND a ZH variant selected as a secondary language (let's say zh-hk) then the Explore feed should we show Explore feed cards in both variants by default (as readers can turn languages off in Explore feed settings)?
If a reader has multiple variants selected as secondary languages, should we support showing all of their selected variants in search?

For readers, the zh variants output by LanguageConverter are essentially just different views on identical underlying data. In almost all cases, they'll just want to see their single preferred variant and ignore the rest. (It's slightly different for editors, who in a few cases need to think about how their output will look in different variants).

@dchan thank you! I'll remove the fallback diagram and specifications. I wasn't sure if LanguageConverter was able to reliably convert all articles to each of the variants, great news that it is able to!

I reviewed the flows @cmadeo, the language selection behavior per your design makes sense to me.

If a reader has a ZH variant selected as their primary language (let's say zh-cn) AND a ZH variant selected as a secondary language (let's say zh-hk) then the Explore feed should we show Explore feed cards in both variants by default (as readers can turn languages off in Explore feed settings)?

This is hard to anticipate, therefore I think this is a good solution since it leaves user’s choice.

If a reader has multiple variants selected as secondary languages, should we support showing all of their selected variants in search?

Yes, I think this would be expected behavior. If you’d like inspiration around showing results in other languages → T260433. Not 100% sure if this is applicable here.


I know you were in touch with @cooltey about it before, but pinging him again to have a look at the task’s description and perform a “final” review.

Hi @cmadeo
The flows make sense to me too.

If a reader has a ZH variant selected as their primary language (let's say zh-cn) AND a ZH variant selected as a secondary language (let's say zh-hk) then the Explore feed should we show Explore feed cards in both variants by default (as readers can turn languages off in Explore feed settings)?

In the current release of Android Wikipedia app, we show the cards in both variants by default, and I believe it would be better for users to decide to hide or not. (and it is also relating to the SE feed cards as well).

If a reader has multiple variants selected as secondary languages, should we support showing all of their selected variants in search?

Yes, it is expected and I agreed with @dchan.

@cooltey or @dchan Thank you both for your help! I was curious if the proposed mapping from iPhone system languages to Wiki-supported variants is correct, in particular should we map Apple's Chinese, Simplified to our zh-cn and Apple's Chinese, Traditional to our zh-tw?

IMG_0719.PNG (2×1 px, 636 KB)

@cmadeo
https://www.ibabbleon.com/iOS-Language-Codes-ISO-639.html
From what I searched on the internet, looks like the Chinese, Traditional should be mapped to zh-hant, and Chinese, Simplified -> zh-hans

However, if the more generic language variants (zh-hant and zh-hans) are not the option in this ticket, I would say your solution should be okay for most of the users.

FYI, the Android app has included zh-hant, zh-hans and the other location-based zh variants.

cmadeo renamed this task from As a chinese reader, I want better control of which character set I see zh wiki content in to As a Chinese reader, I want better control of which character set I see zh wiki content in.Nov 19 2020, 10:53 PM
cmadeo updated the task description. (Show Details)

Thanks, Cooltey! I didn't see zh-hant/zh-hans listed on ZH wiki (as translated by Google 😅) so I didn't realize they were options! Will update the ticket now! Thank you for your help!!!!

@JMinor @cmadeo We have a couple of questions about data migration.

  1. For an existing user that is launching the app for the first time after updating - if they previously chose Chinese as a primary or secondary language, we'll need to choose a variant for them moving forward. Which variant do we use? Do we base it on device language preferences? We also floated the idea of having a little feature announcement when they launch, informing them of the existence of this and allowing them to choose their preferred variant from an announcement modal.
  2. We do persist their variant for saved articles so migration should be okay with that, but there are likely other object types where we do not bother remembering the variant they asked for when we originally fetched the data from the server and persisted (at the very least all the Explore cards). Which variant do we choose for them in this case? That is, if there's Explore data from 3 weeks ago, and they update the app, under which variant in the Explore feed should this data show up under? Or should we clear out this data if we can't discern what variant it is using?

@Jgiannelos @Mholloway We are reworking the way our app handles wiki language variants and have a question in regards to the backends. It looks like by default, we try to send all requests with an Accept-Language header, which takes their iOS device language preferences and constructs a standard value with a format looking like en-us, zh-hans;q=0.67, zh-hant;q=0.33, depending on their settings. Moving forward we will allow language variant preferences to be selected and sorted in the app settings as opposed to device settings.

We have a some questions around the reasons for this format. What is the purpose of the exhaustive list of preferred languages in Accept-Language header? Is it simply for HTTP standards or are there other reasons the backend might need the extra fallback language preferences? For example, if they chose English and Chinese-Simplified in their app settings and we make a request to zh.wikipedia.org, we send en-us, zh-hans;q=0.5, and the same value for a request to en.wikipedia.org, which doesn't support variants at all and feels especially odd. If we were to send no Accept-Language header to en.wikipedia.org and only zh-hans to zh.wikipedia.org, would the response react the same?

These questions are mainly for background, as I assume the goal is to keep all requests in the same RFC style format. It is also probably easier for us to implement client-side if all of the Accept-Language headers remain the same, regardless of which wiki we're requesting. We're mainly just wondering if the backend does anything with it beyond serving up the first variant it finds.

If we were to send no Accept-Language header to en.wikipedia.org and only zh-hans to zh.wikipedia.org, would the response react the same?

Yes. I don't believe there's any need to send an Accept-Language header unless you want to specify a language variant, and the header need not be any more complicated than, e.g., Accept-Language: zh-tw.

OK, I can imagine wanting to send something like zh-tw, zh-hant;q=0.8, zh;q=0.7 for something like Wikidata Item descriptions, where different content is actually stored per language variant rather than being converted on the fly as on Wikipedia. (I'm not sure that the content negotiation on Wikidata is actually that sophisticated, just saying I can see that being something one would want to be able to do.) But in general, you can keep it simple.

@ABorbaWMF @JMinor We can begin QA testing this ticket on TestFlight build 6.8.0 (1797). Note we have not completed migration logic yet, so all testing should occur on a fresh install of the app. Because migration is not complete, we should not release this build to public beta. Build 1797 is only meant to be used for internal testing.

I thought I would update how the editing flow is working so far. I tested making changes in both Chinese and Serbian variants.

  1. When editing both, we fetch the wikitext, which returns mixed characters. This is as expected, although it's a bit confusing for the Serbian article that I tested, because every character happened to be in Cyrillic.
  2. I changed some characters to their corresponding translation characters. After updating, article content and preview in either variant remained unchanged, indicating that the auto-conversion is working properly.
  3. The article content in on desktop also looks unchanged for Zh wiki. However the article content in Serbian on desktop changed (See here, my latin characters are showing in the Cyrillic article version). This seems like maybe a desktop bug to me, since it looked okay in the app (search "који се годинама приказивао достигавши veliku популарност.", it should be "који се годинама приказивао достигавши велику популарност."

Looks good to me so far. I mainly looked at the reading experience and didn't do a lot of editing. One slightly odd thing was the explore feed lists everything as 'from Chinese Wikipedia but shows the variant character, so to the user there are a bunch of the same featured articles in different variants, but I don't think most readers would have all the Chinese variants installed. Tested on 6.8.0 (1797)

JMinor claimed this task.