Page MenuHomePhabricator

Script code should be explicitly specified for multi-script Sinitic languages
Open, MediumPublic

Description

  • Eastern Min cdo => cdo-hans , cdo-hant , cdo-latn
  • Pu-Xian Min cpx => cpx-hans , cpx-hant , cpx-latn
  • Hakka hak => hak-hans , hak-hant , hak-latn
  • Xiang hsn => hsn-hans , hsn-hant
  • Hokkien nan => nan-hans , nan-hant , nan-latn-pehoeji , nan-latn-tailo
  • Wu wuu => wuu-hans , wuu-hant
  • Cantonese yue => yue-hans , yue-hant
  • Mandarin zh => zh-hans , zh-hant , zh-hk ( zh-Hant-HK )

Exception:

  • Literary Chinese lzh : no need to explicitly specify script tag ( Hant only)

Event Timeline

We came across similar conundrum when trying to assign language/script to a photograph inscription in WMC and WD:

Tchien-Lung Ta Whang Tee (for Qianlong Emperor)

if selecting <zh> the word, 中文 (Chinese language) shows which miscommunicating the strings which are obviously in Roman scripts. At the time, none seemed appropriate, we ended up opting for <undetermined>...

Ideally, zh-Latn should be available to apply in WMC and WD. But unfortunately, often times <zh-Latn> assumes Mandarin and <pinyin> transliteration system. This obviously is not pinyin (which did not exist then, 1798). The Latin transliteration may have: (script syntax is 4-Alph in title case)

zh-Latn-pinyin (implemented after 1954?-)
zh-Latn-wadegile (used from 1892?-1998?)
zh-Latin-tongyong (Tongyong system)

what else is out there?

Will the tech team implement more codes addressing the need? Thank you very much!