Page MenuHomePhabricator

Split Min Dong (cdo) translations
Open, Needs TriagePublic

Description

Per consensus at cdowiki and translatewiki.net, we would need to have Min Dong translations in both Han script (cdo-hani) and Latin script (cdo-latn) separated, and disable cdo as "This language code should remain unused. Localise in cdo-hani or cdo-latn please.".

Event Timeline

Restricted Application added a subscriber: Zppix. · View Herald TranscriptJun 30 2016, 12:42 AM

We need some questions answered:

  1. Which code should cdo fall back to? cdo-hant or cdo-latn?
  2. Which language should cdo-latn fall back to?
  3. What should be the autonym of cdo-latn?
  1. Which code should cdo fall back to? cdo-hant or cdo-latn?
  2. Which language should cdo-latn fall back to?

I think

  • What should be the autonym of cdo-latn?
  • cdo-latn: Mìng-dĕ̤ng-ngṳ̄
  • cdo-hani: 閩東語
Yejianfei added a subscriber: Yejianfei.EditedJul 1 2016, 5:14 PM

The creator of Min Dong wiki (cdo wiki) was GnuDoyng. He was a Fuzhou Christian and the Fuzhou version of Bible was printed in cdo-latn script (Min Dong Chinese, Foochow Romanized script), and therefore he preferred to type in cdo-latn script. As a result, the cdo wiki used to be only written in the cdo-latn script.

As time went by, some other native Min Dong speakers came to edit the cdo wiki, and many editors preferred to use the cdo-hani script (Min Dong Chinese, Chinese characters script). As a result, on June 22nd, 2013, the first cdo-hani script entry was added into the cdo wiki. Then, here came many cdo-hani script entries, and the two scripts in our cdo wiki started to run together parallelly. Now we hope these two scripts will be separated into two part, namely cdo-latn and cdo-hani, just like the Chinese wiki, where exists four variant scripts, namely zh-cn, zh-tw, zh-hk and zh-sg.

The default/fallback locale does not really matter, I think. It could be either one. If it is really a technical problem, I think the default locale could be cdo-latn, as it contains more entries than the cdo-hani entries.

To sum up, the cdo wiki should be separated into two scripts, namely

CodeEnglish NameNative NameAlternative Native Name (Full Name)
cdo-latnMin Dong Chinese - Foochow RomanizedMìng-dĕ̤ng-ngṳ̄Mìng-dĕ̤ng-ngṳ̄ Bàng-uâ-cê/閩東語平話字
cdo-haniMin Dong Chinese - Chinese characters閩東語Mìng-dĕ̤ng-ngṳ̄ Háng-cê/閩東語漢字

How can we move on this process? Who knows how to contribute something for this project?

How can we move on this process? Who knows how to contribute something for this project?

This request is stuck because it's hard to tell which direction is really good. Such requests are easier to satisfy if all of the following is met:

  • we end up exposing users to less mixed script localisation, not more;
  • the affected Wikimedia wikis stay in the language they're supposed to be in per LangCom decision;
  • there is consensus at the affected communities;
  • there is a demonstrated workforce to translate both the proposed locales;
  • no LanguageConverter needs to be developed.

Currently, if I see correctly, this request doesn't satisfy any of these points, so it's nearly the hardest possible situation to be in.

This request is stuck because it's hard to tell which direction is really good. Such requests are easier to satisfy if all of the following is met:

  • we end up exposing users to less mixed script localisation, not more;

This request is just helpful to let users less mix script localisation. Now translation of cdo has not been splitted. Some translated words was writtern in latn, while some words was in hani. As a result, we can see that Chinese charaters and Roman letters are mixed in the same page. Hani users and latn users are both reluctant to see this situation. If the translation of cdo can be splitted successfully, this problem can also be solved.

  • no LanguageConverter needs to be developed.

    Currently, if I see correctly, this request doesn't satisfy any of these points, so it's nearly the hardest possible situation to be in.

LanguageConverter is not necessary in this request because we can keep the expressions of hani and latn same manually.

ztl8702 added a comment.EditedNov 2 2017, 7:02 PM

Can anyone please clarify whether the scope of this ticket is merely separating the UI translations of cdo into two UI language codes, or does it have anything to do with Min Dong Wikipedia becoming a multi-script Wikipedia?

Yejianfei added a comment.EditedNov 3 2017, 1:25 AM

Yes, a multi-script Min Dong Wikipedia is needed, namely cdo-latn and cdo-hani, just like Serbian Wikipedia, but manual conversion is needed, so don't give us any LanguageConverter.

The conversion is a single-to-multiple map from either direction, either from cdo-hani to cdo-latn, or from cdo-latn to cdo-hani, because there are too many homophones and homographs. e.g. from cdo-hani to cdo-latn, the homograph can be pronounced as hèng or hòng depending on the context, while from cdo-latn to cdo-hani, the homophone mìng can be written as , or depending on the context. There as a more serious problem for unorthodox writings such as 無年呆, 起動, etc, which may map to "abnormal" pronunciations. Therefore, automatic conversion is almost impossible, unless a language AI is really developed.

Moreover, the language name is wanted to change from Mìng-dĕ̤ng-ngṳ̄ to Mìng-dĕ̤ng-ngṳ̄/閩東語, which is listed on the left-hand Languages sidebar.

ztl8702 added a comment.EditedNov 3 2017, 2:31 AM

@Yejianfei
Thanks for the clarification.
+1 for multi-script Wikipedia.
Also, semi-auto conversion between Min Dong scripts is possible, although manual intervention is unavoidable at this moment.

Cyclohexane233 added a comment.EditedNov 3 2017, 6:51 AM

The discussion was very pluralistic, many different opinions were shared.
I think the interface of cdo-latn and cdo-hani in Min Dong Wikipedia should be separated.If latn and hani both in one article, the vision effects are bad.And that, cdo-latn and cdo-hani are two totally different writing systems(unlike Chinese simplified and traditional can transform easily), they can't be use together to write an article.
Now the most articles, which in the Min Dong Wikipedia, are written in latn.Their existence is valuable, and need not to be changed into hani.
I support the opinions of @ztl8702 @Yejianfei.The primary task is not only merely separating cdo-latn and cdo-hani in Min Dong Wikipedia, but also trying to make Min Dong Wikipedia becomes a multi-script Wikipedia.
Some might say cdo-latn and cdo-hani can't correspond word-to-word, but we can see Serbian Wikipedia, which is a good example.We should have the idea(make a multi-script Min Dong Wikipedia) firstly, then technology will be considered.Manual intervention or LanguageConverter, there is always a best way to slove problems which might occur possibly.
All things considered, making a multi-script Min Dong Wikipedia is arduous, which require the support from all of us.
Above is my statement for my personal opinions.

Yejianfei added a comment.EditedNov 3 2017, 7:11 AM

@Cyclohexane233
Serbian Wikipedia is the same as simplified Chinese characters to traditional Chinese characters convertion. Cyrillic and Latin scripts are strictly one-to-one mapping (see this table), which is known as transliteration, even stricter than simplified Chinese characters to traditional Chinese characters.

Simplified Chinese characters to traditional Chinese characters convertion includes very few examples of one-to-mutiple conversion, such as 发 mapping to 發 and 髪, which does not even exist in the conversion from Cyrillic to Latin scripts.

All of two paragraphs above I wrote are not important. The only important thing is to make it a dual-script Wikipedia now according to the our consensus. Split the interface now, and change the language name from Mìng-dĕ̤ng-ngṳ̄ to Mìng-dĕ̤ng-ngṳ̄/閩東語, which is shown in the left-hand Languages sidebar.

Steps: First, change the language name shown on the left-hand Languages sidebar from Mìng-dĕ̤ng-ngṳ̄ to Mìng-dĕ̤ng-ngṳ̄/閩東語. Second, split https://cdo.wikipedia.org/wiki/ into https://cdo.wikipedia.org/cdo-latn/ and https://cdo.wikipedia.org/cdo-hani/. Let's do it now.

Can this issue be solved by creating instead separate namespaces ("Hani:" and "Latn:") in the cdo wiki for articles that are not mixing scripts and not using autotranslation? And then keep the "/wiki/" path unchanged (so that it will work correctly for interwikis with other languages).
This does not man that the UI cannot be separated in its translation, however this means that namespaces should have distinct default language codes using distinct scripts; The "Hani" part would have the Hans<->Hant transliterator handled as a variant selectable by viewers, but not between Hans<->Latn and Hant<->Latn.
Some common pages would remain in the main namespace only if they support autotranslation (via the translate extension, or via templates based on PAGECONTENTLANGUAGE or UI language where applicable to generic banners)

Finally, the interwiki resolvers should be able to resolve

  • "cdo:ArticleName" as the native cdo.wikipedia.org/wiki/ArticleName, but
  • "cdo-latn:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Latn:ArticleName" (the same target as "cdo:latn:ArticleName")
  • "cdo-hani:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName" (the same target as "cdo:hani:ArticleName")
  • "cdo-hans:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName?uselang=cdo-hans" (the same target as "cdo:hani:ArticleName" but with a prefered Simplified Han presentation, using the Han transliterator, possibly tuned by a dictionary of exceptions needed for Min Dong)
  • "cdo-hant:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName?uselang=cdo-hant" (the same target as "cdo:hani:ArticleName" but with a prefered Traditional Han presentation, using the Han transliterator, possibly tuned by a dictionary of exceptions needed for Min Dong)
  • so "cdo-latn:", "cdo-hani:" become valid interwikis, and we can still have wikidata fed with linguistic entries in "cdo-hans:" and "cdo-hant:" when needed (when not using the exception dictionary)

We have similar situation for Kurdish, we may soon have the same situation for Kazakh (if there's no simple 1-to-1 transliteration between Cyrl<->Latn), and other Turkic languages are concerned too with the transition to Latin. Similar issue as well in Africa (Berber languages using Arab<->Latn conversion which is lossy)

Liuxinyu970226 added a comment.EditedFeb 10 2018, 4:01 PM

I would take my point, that do we really need to introduce hans to Eastern/Middle/Northern/Southern Min? As Yejianfei said above many times, hans was made for Mandarin (cmn), and probably suitable for Cantonese (yue), kindly suitable for Classical (lzh), works better for Wuu (aka Shanghainese), suitable for Sichuanese (though a code request is recently rejected by SIL due to concerns from Academical communities in Sichuan (itself, 囧rz)), and maybe somewhat suitable for Japanese (hehe) and Korean (another hehe), as picking a good translation for foreign languages' words won't be affected by different means of random single han character. However, there's really a cul-de-sac issue in pan-"Fujian+Eastern Guangdong+Southwestern Zhejiang" society that such "flexible" isn't existing in whole Min history and future, which is the reason that user aganist hans in his original request.

so "cdo-latn:", "cdo-hani:" become valid interwikis, and we can still have wikidata fed with linguistic entries in "cdo-hans:" and "cdo-hant:" when needed (when not using the exception dictionary)

I think it is a good idea to solve problems about interwiki links, so that articles written in cdo-hani can also be found easily in the left-hand Languages sidebar.