Page MenuHomePhabricator

Localization Inconsistencies in Wikipedia App: Improving Regional Word Display
Open, LowPublic

Assigned To
None
Authored By
ARamadan-WMF
Nov 27 2023, 5:58 AM
Referenced Files
F41598859: image.png
Dec 13 2023, 4:53 PM
F41598857: image.png
Dec 13 2023, 4:53 PM
F41598855: image.png
Dec 13 2023, 4:53 PM
F41598853: image.png
Dec 13 2023, 4:53 PM
F41594842: app zh-Hant-TW
Dec 11 2023, 7:30 PM
F41594840: app zh-tw
Dec 11 2023, 7:30 PM
F41594838: desktop zh-tw
Dec 11 2023, 7:30 PM
F41594800: Screenshot_20231211-105302.png
Dec 11 2023, 6:58 PM

Description

Recently, when using the Wikipedia app, I discovered that the localization of words on the app is not as complete as on the web version. For example, when browsing the article Georgia (country) on the app, Georgia (used in Taiwan) and Georgia (others) appear alternately in the text. (used in different regions), it’s not very comfortable to watch. In addition to this entry, there are many examples such as Montenegro and Montenegro, Slovenia and Slovenia, etc. I hope they can be solved as soon as possible!
wish you well
Version: WikipediaApp/7.4.3.2822 (iPadOS 17.0.3; Tablet)

Event Timeline

JTannerWMF added subscribers: ABorbaWMF, JTannerWMF.

Can you investigate if this is true @ABorbaWMF , we are trying to see if the variant logic is different?

Also can you see if this is happening on both android and iOS

@ARamadan-WMF - Hello, can we possibly ask for screenshots about this specific issue and clarification on what the user is seeing?

I noticed that the disambiguation text is in different locations within the articles on Georgia (country) and Georgie (u.s. state) articles, however, I am unsure if this is the issue the user is describing.

Georgia stateGeorgia country
IMG_D3DE952B12FD-1.jpeg (2×1 px, 788 KB)
IMG_82DCC367DFC6-1.jpeg (2×1 px, 1 MB)

Here is an example of the Georgia (country) article on Mobile Web vs iOS and Android Apps. They look similar to me, but I may not be able to pick out the differences the user has reported.

Mobile WebiOSAndroid
Screenshot 2023-12-11 at 10.46.27 AM.png (1×1 px, 500 KB)
IMG_6753DB47478B-1.jpeg (2×1 px, 1 MB)
Screenshot_20231211-105302.png (2×1 px, 350 KB)

Something I do notice is that the mobile-html endpoint response returns different characters if I send zh-tw in the Accept-Language header vs, zh-Hant-TW. We changed it to zh-Hant-TW a few releases ago as a part of https://phabricator.wikimedia.org/T338079. When comparing the first paragraph on Desktop with https://zh.wikipedia.org/api/rest_v1/page/mobile-html/%E6%A0%BC%E9%B2%81%E5%90%89%E4%BA%9A (Georgia ZH article), it seems the zh-tw response matches Desktop, whereas zh-Hant-TW does not.

@Jgiannelos Is this expected? Are we sending the wrong BCP 47 code here?

Desktop zh-tw:

desktop zh-tw (345×582 px, 104 KB)

app zh-tw:

app zh-tw (1×565 px, 395 KB)

app zh-Hant-TW:

app zh-Hant-TW (1×565 px, 394 KB)

@ABorbaWMF, I emailed the user; once I receive his reply, I will update the ticket.

Here's the user reply:

From what I see in the 工單(I’m not sure what it’s called in English), what the user Tsevener showed is exactly what happened to me. That is, there seems to be a problem that the zh-Hant-TW displays the word Georgia and Azerbaijan in Chinese wrong. It displays the word used mainly in Mainland China(格魯吉亞&阿塞拜疆)instead of the ones used in Taiwan (喬治亞&亞塞拜然). Hope this helps. (Sorry for bad English)

Moving this to blocked until Content Transform can comment.

From restbase after purging the specific page just to make sure that we dont get any stale content:

page/mobile-html output

zh-Hant-TW

image.png (1×1 px, 371 KB)

zh-tw
image.png (1×1 px, 371 KB)

If I request the same output directly from PCS:
zh-Hant-TW

image.png (1×1 px, 370 KB)

zh-tw

image.png (1×1 px, 370 KB)

The last 2 screenshots look the same to me. I think there is something wrong in language variant handling in RESTBase level.