Page MenuHomePhabricator

[BUG] Wikidata description for the specific Chinese language variant should be shown
Open, Needs TriagePublic

Description

Steps to reproduce
  1. Go to the article on China using Simplified Chinese and note the Wikidata description.
  2. Change to view the article in Traditional Chinese and note the Wikidata description.

Expected

The description that should be shown when viewing in Traditional Chinese should be the description from the "Traditional Chinese" row in the Wikidata entry for China, and the same for Simplified Chinese.

Actual

The description shown is pulling from the "Chinese" row in Wikidata, so there are characters being used and displayed in one variant when the language is set to the other. (In the example of the "China" article, there are Simplified characters in the description "中华人民共和国" showing on the Traditional character variant of the article)

Related Objects

StatusAssignedTask
Resolvedovasileva
OpenNone
DuplicateNone
OpenABorbaWMF
ResolvedPchelolo
OpenNone
Resolvedmobrovac
ResolvedEevans
ResolvedEevans
ResolvedDzahn
ResolvedEevans
OpenNone
DeclinedNone
ResolvedEevans
Resolvedfgiunchedi
ResolvedEevans
ResolvedPchelolo
Opencooltey
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedEevans
ResolvedEevans
ResolvedEevans
Resolvedmobrovac
Resolvedcscott
ResolvedPchelolo
ResolvedPchelolo

Event Timeline

RHo created this task.Aug 22 2017, 3:31 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 22 2017, 3:31 PM
RHo updated the task description. (Show Details)Aug 22 2017, 3:32 PM
bearND added a subscriber: bearND.

MCS is not used for zhwiki until Parsoid and RESTBase can handle language variants.

Dbrant added a subscriber: Dbrant.Sep 15 2017, 1:54 PM

So wait... Wikidata has descriptions in Traditional Chinese, Simplified Chinese, and another one called "Chinese"? What does that third one mean? Is it another variant all to itself, or does it "default" to traditional or simplified?

RHo added a comment.Sep 15 2017, 2:37 PM

The China wikidata entry actually shows a different description for multiple variants, but only uses the "original" Chinese one.

This can be better illustrated by taking an example Chinese wiki article with no description yet, which looked like this:


After populating the description in Wikidata, the table is only updated for the Chinese only article:

TL:DR; Looks like the behavior is similar to Simple English vs English in that a row can be created for the different variants in wikidata, but we are only pulling in whatever is in the 'Chinese' row without any character transforms being applied.

Here are the screenshots might help you:

  • Read an article in Traditional Chinese, it does not load the description from Wikidata; But, when you go to the Wikidata page, you can find the description has been published on it.
  • ==>
    • Android app PUSH the description in "zh-hant" language code to Wikidata ==> correct
    • Android app GET the description in "zh" language code from Wikidata ==> not correct

  • After update the description on Chinese label on the Wikidata, and then you will see the description after refreshing the Android app article page

When I tried to keep only Traditional Chinese label and description label on the Wikidata, the app did not show the description.

cooltey claimed this task.Sep 18 2017, 5:17 PM
Mholloway added a comment.EditedOct 4 2017, 7:36 PM

To repeat (and expand) relevant discussion from T177342:

Neither API currently provides support for specifying a language variant for Wikidata descriptions. Even the mobileview API's 'variant' parameter has no effect here. It could be added, but we'll run into the same issue as with T176678 that the mobileview API is deprecated and in principle we shouldn't be spending time on it.

For MCS/RESTBase, we're waiting on T159985. (Note that this in turn depends on T43716, which is triaged at low priority.)

For the mobileview API, this is the method I think we'd need to update for variant support: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/27599dfbdaecf1ca0a12164e648a14facefd00d2/includes/MobileFrontend.body.php#L189-L209

As a side note, I get the sense no one would object to moving the mobileview API into the MobileApp extension as discussed on T176678, which would remove the need for the Reading Web team to be involved, though that would still leave an open question about how much work we should be putting into the mobileview API on an ongoing basis. (Also, such a change should probably be announced in advance on mobile-l and wikitech-l.)

cscott added a subscriber: cscott.Dec 8 2017, 8:56 PM

Some of the problem here is that historically LanguageConverter does not specifically tag the source language variant of the text, since it is assumed it can be inferred from the character set. This is more-or-less true for Serbian (latin/cyrillic) and Chinese (simplified/traditional) but falls down badly with (say) British/American English. And it doesn't work 100% even for Serbian and Chinese, depending on the exact input text. The original article text in Wikipedia is a mix of variants, again with the assumption that you can determine on a word by word basis what the original variant is and what needs conversion.

Anyway, Parsoid is getting the ability to do language variant conversion, but we going forward we need to be careful to accurately record the source language variant -- for example, Wikidata should really be taking appropriate care and not following Mediawiki's (bad) example.

It doesn't seems to have any task to track Chinese language variant for Wikidata. But I'm not sure. I'd be grateful if anyone can link that or create a task.

This has also been a problem for importing data to Wikidata. I'm always confused by the difference of Chinese, Simplified and Traditional Chinese there.

I think we should:

  • remove Chinese in Wikidata
  • zh-cn, zh-sg falls back to zh-hans
  • zh-tw, zh-hk and zh-mo falls back to zh-hant.

In this way, we can map all language variants to Wikidata precisely without any other rules.

I agree with @fantasticfears 's comments. Allowing user to mark a label in zh is not accurate enough. I even thought of, in an aggressive perspective, we should use zh-cn/tw/hk/mo/sg instead of plain zh-hans/hant, when specifying the label and description for an entity. After all, aside from fallbacks (e.g. zh-tw --> zh-hant), wikidata can automatically use, e.g. zh-tw label when a user request for zh-hant label, but there is no zh-hant label directly assigned to this entity.

I want to help with this. Wikibase doesn't cover this now. It's much more reasonable to fix this on their end. Maybe you can chime in and ask Wikidata people? @RHo

hi @fantasticfears, have just tagged wikidata again for their comment first, seems it was removed after the original ticket was filed fsr...

Yup, it looks like this needs to be updated to also account for the user language variant, currently it just uses the site content language.
This description is used in onOutputPageParserOutput (not sure if the stuff there is cached), might require a cache split based on user content language variant (not sure if it is already split on that)?

Should be a pretty smallish patch to MobileFrontend

Addshore moved this task from incoming to monitoring on the Wikidata board.Aug 30 2018, 9:11 AM

@Addshore I was actually proposing storage/data model. If I have some information about how the data gets rendered from DB, I might be able to submit patches.

Should I also add iOS-app-Bugs and Wikipedia-Android-App-Backlog ? Or, is this very bug also happened on iOS?

@Liuxinyu970226 Mm... I didn't use the IOS system, but I used Android. I think the result is the same.

@Liuxinyu970226 Sorry... I mean the result is the no conversion.

Isn't this task to merge the entries into the single "Chinese" entry??

Seems the merging is not quite suitable because there's some matters such as different names in different region such as the title of a movie, or a drama series.

At least please leave those variants alone as before, such as "zh-CN", "zh-HK", "zh-MO", "zh-TW", "zh-SG", "zh-MY" etc.

Those variants are better off with fallback chains as Mediawiki (that’s a nice system)