Page MenuHomePhabricator

Split Min Dong (cdo) translations
Open, Needs TriagePublic

Description

Per consensus at cdowiki and translatewiki.net, we would need to have Min Dong translations in both Han script (cdo-hani) and Latin script (cdo-latn) separated, and disable cdo as "This language code should remain unused. Localise in cdo-hani or cdo-latn please.".

Event Timeline

Restricted Application added a subscriber: Zppix. · View Herald TranscriptJun 30 2016, 12:42 AM

We need some questions answered:

  1. Which code should cdo fall back to? cdo-hant or cdo-latn?
  2. Which language should cdo-latn fall back to?
  3. What should be the autonym of cdo-latn?
  1. Which code should cdo fall back to? cdo-hant or cdo-latn?
  2. Which language should cdo-latn fall back to?

I think

  • What should be the autonym of cdo-latn?
  • cdo-latn: Mìng-dĕ̤ng-ngṳ̄
  • cdo-hani: 閩東語
Yejianfei added a subscriber: Yejianfei.EditedJul 1 2016, 5:14 PM

The creator of Min Dong wiki (cdo wiki) was GnuDoyng. He was a Fuzhou Christian and the Fuzhou version of Bible was printed in cdo-latn script (Min Dong Chinese, Foochow Romanized script), and therefore he preferred to type in cdo-latn script. As a result, the cdo wiki used to be only written in the cdo-latn script.

As time went by, some other native Min Dong speakers came to edit the cdo wiki, and many editors preferred to use the cdo-hani script (Min Dong Chinese, Chinese characters script). As a result, on June 22nd, 2013, the first cdo-hani script entry was added into the cdo wiki. Then, here came many cdo-hani script entries, and the two scripts in our cdo wiki started to run together parallelly. Now we hope these two scripts will be separated into two part, namely cdo-latn and cdo-hani, just like the Chinese wiki, where exists four variant scripts, namely zh-cn, zh-tw, zh-hk and zh-sg.

The default/fallback locale does not really matter, I think. It could be either one. If it is really a technical problem, I think the default locale could be cdo-latn, as it contains more entries than the cdo-hani entries.

To sum up, the cdo wiki should be separated into two scripts, namely

CodeEnglish NameNative NameAlternative Native Name (Full Name)
cdo-latnMin Dong Chinese - Foochow RomanizedMìng-dĕ̤ng-ngṳ̄Mìng-dĕ̤ng-ngṳ̄ Bàng-uâ-cê/閩東語平話字
cdo-haniMin Dong Chinese - Chinese characters閩東語Mìng-dĕ̤ng-ngṳ̄ Háng-cê/閩東語漢字

How can we move on this process? Who knows how to contribute something for this project?

How can we move on this process? Who knows how to contribute something for this project?

This request is stuck because it's hard to tell which direction is really good. Such requests are easier to satisfy if all of the following is met:

  • we end up exposing users to less mixed script localisation, not more;
  • the affected Wikimedia wikis stay in the language they're supposed to be in per LangCom decision;
  • there is consensus at the affected communities;
  • there is a demonstrated workforce to translate both the proposed locales;
  • no LanguageConverter needs to be developed.

Currently, if I see correctly, this request doesn't satisfy any of these points, so it's nearly the hardest possible situation to be in.

This request is stuck because it's hard to tell which direction is really good. Such requests are easier to satisfy if all of the following is met:

  • we end up exposing users to less mixed script localisation, not more;

This request is just helpful to let users less mix script localisation. Now translation of cdo has not been splitted. Some translated words was writtern in latn, while some words was in hani. As a result, we can see that Chinese charaters and Roman letters are mixed in the same page. Hani users and latn users are both reluctant to see this situation. If the translation of cdo can be splitted successfully, this problem can also be solved.

  • no LanguageConverter needs to be developed.

Currently, if I see correctly, this request doesn't satisfy any of these points, so it's nearly the hardest possible situation to be in.

LanguageConverter is not necessary in this request because we can keep the expressions of hani and latn same manually.

ztl8702 added a comment.EditedNov 2 2017, 7:02 PM

Can anyone please clarify whether the scope of this ticket is merely separating the UI translations of cdo into two UI language codes, or does it have anything to do with Min Dong Wikipedia becoming a multi-script Wikipedia?

Yejianfei added a comment.EditedNov 3 2017, 1:25 AM

Yes, a multi-script Min Dong Wikipedia is needed, namely cdo-latn and cdo-hani, just like Serbian Wikipedia, but manual conversion is needed, so don't give us any LanguageConverter.

The conversion is a single-to-multiple map from either direction, either from cdo-hani to cdo-latn, or from cdo-latn to cdo-hani, because there are too many homophones and homographs. e.g. from cdo-hani to cdo-latn, the homograph can be pronounced as hèng or hòng depending on the context, while from cdo-latn to cdo-hani, the homophone mìng can be written as , or depending on the context. There as a more serious problem for unorthodox writings such as 無年呆, 起動, etc, which may map to "abnormal" pronunciations. Therefore, automatic conversion is almost impossible, unless a language AI is really developed.

Moreover, the language name is wanted to change from Mìng-dĕ̤ng-ngṳ̄ to Mìng-dĕ̤ng-ngṳ̄/閩東語, which is listed on the left-hand Languages sidebar.

ztl8702 added a comment.EditedNov 3 2017, 2:31 AM

@Yejianfei
Thanks for the clarification.
+1 for multi-script Wikipedia.
Also, semi-auto conversion between Min Dong scripts is possible, although manual intervention is unavoidable at this moment.

Cyclohexane233 added a comment.EditedNov 3 2017, 6:51 AM

The discussion was very pluralistic, many different opinions were shared.
I think the interface of cdo-latn and cdo-hani in Min Dong Wikipedia should be separated.If latn and hani both in one article, the vision effects are bad.And that, cdo-latn and cdo-hani are two totally different writing systems(unlike Chinese simplified and traditional can transform easily), they can't be use together to write an article.
Now the most articles, which in the Min Dong Wikipedia, are written in latn.Their existence is valuable, and need not to be changed into hani.
I support the opinions of @ztl8702 @Yejianfei.The primary task is not only merely separating cdo-latn and cdo-hani in Min Dong Wikipedia, but also trying to make Min Dong Wikipedia becomes a multi-script Wikipedia.
Some might say cdo-latn and cdo-hani can't correspond word-to-word, but we can see Serbian Wikipedia, which is a good example.We should have the idea(make a multi-script Min Dong Wikipedia) firstly, then technology will be considered.Manual intervention or LanguageConverter, there is always a best way to slove problems which might occur possibly.
All things considered, making a multi-script Min Dong Wikipedia is arduous, which require the support from all of us.
Above is my statement for my personal opinions.

Yejianfei added a comment.EditedNov 3 2017, 7:11 AM

@Cyclohexane233
Serbian Wikipedia is the same as simplified Chinese characters to traditional Chinese characters convertion. Cyrillic and Latin scripts are strictly one-to-one mapping (see this table), which is known as transliteration, even stricter than simplified Chinese characters to traditional Chinese characters.

Simplified Chinese characters to traditional Chinese characters convertion includes very few examples of one-to-mutiple conversion, such as 发 mapping to 發 and 髪, which does not even exist in the conversion from Cyrillic to Latin scripts.

All of two paragraphs above I wrote are not important. The only important thing is to make it a dual-script Wikipedia now according to the our consensus. Split the interface now, and change the language name from Mìng-dĕ̤ng-ngṳ̄ to Mìng-dĕ̤ng-ngṳ̄/閩東語, which is shown in the left-hand Languages sidebar.

Steps: First, change the language name shown on the left-hand Languages sidebar from Mìng-dĕ̤ng-ngṳ̄ to Mìng-dĕ̤ng-ngṳ̄/閩東語. Second, split https://cdo.wikipedia.org/wiki/ into https://cdo.wikipedia.org/cdo-latn/ and https://cdo.wikipedia.org/cdo-hani/. Let's do it now.

Can this issue be solved by creating instead separate namespaces ("Hani:" and "Latn:") in the cdo wiki for articles that are not mixing scripts and not using autotranslation? And then keep the "/wiki/" path unchanged (so that it will work correctly for interwikis with other languages).
This does not man that the UI cannot be separated in its translation, however this means that namespaces should have distinct default language codes using distinct scripts; The "Hani" part would have the Hans<->Hant transliterator handled as a variant selectable by viewers, but not between Hans<->Latn and Hant<->Latn.
Some common pages would remain in the main namespace only if they support autotranslation (via the translate extension, or via templates based on PAGECONTENTLANGUAGE or UI language where applicable to generic banners)

Finally, the interwiki resolvers should be able to resolve

  • "cdo:ArticleName" as the native cdo.wikipedia.org/wiki/ArticleName, but
  • "cdo-latn:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Latn:ArticleName" (the same target as "cdo:latn:ArticleName")
  • "cdo-hani:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName" (the same target as "cdo:hani:ArticleName")
  • "cdo-hans:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName?uselang=cdo-hans" (the same target as "cdo:hani:ArticleName" but with a prefered Simplified Han presentation, using the Han transliterator, possibly tuned by a dictionary of exceptions needed for Min Dong)
  • "cdo-hant:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName?uselang=cdo-hant" (the same target as "cdo:hani:ArticleName" but with a prefered Traditional Han presentation, using the Han transliterator, possibly tuned by a dictionary of exceptions needed for Min Dong)
  • so "cdo-latn:", "cdo-hani:" become valid interwikis, and we can still have wikidata fed with linguistic entries in "cdo-hans:" and "cdo-hant:" when needed (when not using the exception dictionary)

We have similar situation for Kurdish, we may soon have the same situation for Kazakh (if there's no simple 1-to-1 transliteration between Cyrl<->Latn), and other Turkic languages are concerned too with the transition to Latin. Similar issue as well in Africa (Berber languages using Arab<->Latn conversion which is lossy)

Liuxinyu970226 added a comment.EditedFeb 10 2018, 4:01 PM

I would take my point, that do we really need to introduce hans to Eastern/Middle/Northern/Southern Min? As Yejianfei said above many times, hans was made for Mandarin (cmn), and probably suitable for Cantonese (yue), kindly suitable for Classical (lzh), works better for Wuu (aka Shanghainese), suitable for Sichuanese (though a code request is recently rejected by SIL due to concerns from Academical communities in Sichuan (itself, 囧rz)), and maybe somewhat suitable for Japanese (hehe) and Korean (another hehe), as picking a good translation for foreign languages' words won't be affected by different means of random single han character. However, there's really a cul-de-sac issue in pan-"Fujian+Eastern Guangdong+Southwestern Zhejiang" society that such "flexible" isn't existing in whole Min history and future, which is the reason that user aganist hans in his original request.

so "cdo-latn:", "cdo-hani:" become valid interwikis, and we can still have wikidata fed with linguistic entries in "cdo-hans:" and "cdo-hant:" when needed (when not using the exception dictionary)

I think it is a good idea to solve problems about interwiki links, so that articles written in cdo-hani can also be found easily in the left-hand Languages sidebar.

This problem has been dragging on for too long. Mindong Wikipedia has already had 14,259 entries. The sooner the problem is solved, the more successfully it will be for Wikidata to create links to existing entries, the easier it will be for readers to read articles in Mindong Wikipedia.

The solution, as everybody here has suggested in the past two years, is to create two language codes without splitting the site: cdo-hani and cdo-latn.

On this basis, people are allowed to see different interface texts. When you visit:

Translatewiki may need to be optimized accordingly. The sooner this proposal is implemented, the sooner we can provide the interface text of the Chinese characters version.

At the same time, Wikidata also needs to support the filling of cdo-hani and cdo-latn wiki entry names. Such Chinese characters and Romanized entries can be correctly linked in the database, and the links can be correctly displayed in the sidebar. The effect of the visit is as stated by Verdy_p:

在T139010#3935388中,@Verdy_p写道:

Finally, the interwiki resolvers should be able to resolve

  • "cdo:ArticleName" as the native cdo.wikipedia.org/wiki/ArticleName, but
  • "cdo-latn:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Latn:ArticleName" (the same target as "cdo:latn:ArticleName")
  • "cdo-hani:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName" (the same target as "cdo:hani:ArticleName")
  • "cdo-hans:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName?uselang=cdo-hans" (the same target as "cdo:hani:ArticleName" but with a prefered Simplified Han presentation, using the Han transliterator, possibly tuned by a dictionary of exceptions needed for Min Dong)
  • "cdo-hant:ArticleName" would also be recognized to point to "cdo.wikipedia.org/wiki/Hani:ArticleName?uselang=cdo-hant" (the same target as "cdo:hani:ArticleName" but with a prefered Traditional Han presentation, using the Han transliterator, possibly tuned by a dictionary of exceptions needed for Min Dong)
  • so "cdo-latn:", "cdo-hani:" become valid interwikis, and we can still have wikidata fed with linguistic entries in "cdo-hans:" and "cdo-hant:" when needed (when not using the exception dictionary)

Up to now, Chinese character entries and Romanized entries rely on templates. With Wikidata, links will be more standardized. Mindong Wikipedia has 14,259 entries up to now. The sooner this proposal is implemented, the sooner we can use robots to import links between entries into Wikidata and manually resolve the remaining links (usually some categorie pages and template pages).

What are we waiting for?

I hope the interface of cdo-latn and cdo-hani in Min Dong Wikipedia will be separated soon. The situation that Chinese charaters and Roman letters mixed in the same page have troubled visitors and entry editors for a long time.

Yejianfei added a comment.EditedAug 5 2019, 6:01 PM

We need some questions answered:

  1. Which code should cdo fall back to? cdo-hant or cdo-latn?
  2. Which language should cdo-latn fall back to?
  3. What should be the autonym of cdo-latn?

There is no need to fallback, no need to have an autonym.
Just enable these two URLs rather than showing 404 pages.

That's it. Not that difficult. Just simply show me a creation page rather than a 404 page.

Nothing else is needed. That is to say: no translation is needed, no LanguageConverter is needed, no any other workforce is needed.

Why does it procrastinate for so many years?

Since it has been procrastinated for so long, let me write some code by myself:

Amire80 added a subscriber: Amire80.EditedAug 6 2019, 10:34 AM

Since it has been procrastinated for so long, let me write some code by myself:

Thanks for kicking the process off, but I have some questions.

Why is it "Hani" and not "Hans", or "Hant", or both?

Is it really good that cdo will have two names? We have several languages with two names (like Tatar), but it's not really good, and I'm quite reluctant about adding even more of them. There should one default.

Yejianfei added a comment.EditedAug 6 2019, 10:43 AM

Since it has been procrastinated for so long, let me write some code by myself:

Thanks for kicking the process off, but I have some questions.
Why is it "Hani" and not "Hans", or "Hant", or both?
Is it really good that cdo will have two names? We have several languages with two names (like Tatar), but it's not really good, and I'm quite reluctant about adding even more of them. There should one default.

  1. Why is it "Hani" rather than "Hans" or "Hant"?

It has already been said on translatewiki.

Either "Hans" or "Hant" charset is not enough. We in fact need an entire Unicode CJK character set, because "𣍐" and "𡅏" are used very often in Min Dong language, but these two Chinese characters are located in Unicode CJK Ext-B block, which cannot be easily said to be either "Hans" or "Hant". Since "Hani" includes both and even more, we choose "Hani".

Actually, it is very naive to classify Chinese orthography into Simplified Chinese characters and Traditional Chinese characters. There are much more categories than these two. Min Dong Chinese - Chinese characters is just one of the examples.

  1. Is it really good that cdo will have two names?

Yes, it is the only way. There is absolutely no way to have any automatic conversions between these two scripts. Therefore, give them two names. It would be even better to give them two Wikipedias instead of one.

These two scripts are completely incompatible, so please don't make any fallbacks. If fallback is required according to the MediaWiki rules, then fall back to English.

Verdy_p added a comment.EditedAug 7 2019, 10:12 AM

These two scripts are completely incompatible, so please don't make any fallbacks. If fallback is required according to the MediaWiki rules, then fall back to English.

Fallback to English is just the worst.

It seems more reasonable to fallback to one or the other script variant of Min Dong (Latin or Han) which is available. And then using Mandarin followed by Min Nan (for the Min Dong Han variant), or Min Nan and Mandarin (for the Latin variant). And all this before the last English fallback (only when there's no Min Dong, no Min Nan, and no Chinese at all).

This means:

  • cdo > cdo-Hant > cdo-Hani > cdo-Hans > nan > zh-hant > zh-hans > en
  • cdo-Hani > cdo-Hant > cdo-Hans > cdo > zh-hant > zh-hans > nan > en
  • cdo-Hans > cdo-Hani > cdo-Hant > cdo > zh-hans > zh-hant > nan > en
  • cdo-Hant > cdo-Hani > cdo-Hans > cdo > zh-hant > zh-hans > nan > en

(I used nan because it has Latin by default, and it's a good fallback for the use of Min Dong in Taiwan, where both Min Dong and Min Nan have official support, in addition to Simplified Mandarin)

Note also that the Min Dong language has three (not two) orthographies: you have to count the simplified Han as well.

So the "native" names (autonyms) should be:

  • "Mìng-dĕ̤ng-ngṳ̄/闽东语/閩東語" for cdo
  • "Mìng-dĕ̤ng-ngṳ̄" for cdo-Latn
  • "闽东语/閩東語" for cdo-Hani
  • "闽东语" for cdo-Hans
  • "閩東語" for cdo-Hant