Page MenuHomePhabricator

[EPIC] Enable language code mul on Wikidata
Open, Needs TriagePublic

Description

What is this task?
This task is used for planning and organizing only. To comment on the project or discuss, please use one of the linked tasks instead.

Description of main objective

Milestones

v0.1 Initial Release on Test Wikidata (DONE)

v0.2 Fix unexpected behavior on Test Wikidata (TODO)

v0.3 Improvements to make this ready for Wikidata (TODO)

v1.0 Initial Release on proper Wikidata

v1.1 Enable mul for all users by default

Later

Links:

Related Objects

Event Timeline

Manuel renamed this task from [EPIC] Language code mul on Wikidata to [EPIC] Language code mul on Wikidata .Jul 5 2022, 1:34 PM
Manuel updated the task description. (Show Details)
Manuel renamed this task from [EPIC] Language code mul on Wikidata to [EPIC] Enable language code mul on Wikidata .Jul 5 2022, 2:34 PM
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)

I thought that we could get some bad fallbacks from low Babel entries trumping mul. This is however not the case as the fallback for the html title is independent of Babels (see Language fallback chain on Wikidata).

Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)

I know "mul-arab" has been brought up in previous threads and not included in this then, but I would like to strongly recommend reconsidering and that at least 3 "mul" termbox labels be made to minimize confusion. These would be:

  • mul for standalone numbers and glyphs
  • mul for left to right
  • mul for right to left

The exact names or codes do not matter that much I think so much as they are usable for these purposes. The problem with single "mul" is that left to right and right to left scripts when rendered together in a line or using the same code often result in messy rendering in browsers, with unpredictable positions of text or things like letters appearing out of order.

By standalone numbers and glyphs, I say this to account for the fact that there are no actual Arabic "Arabic numerals" - the digits used in most writing systems regardless of overall directions are variations of the same Indic numerals which are always read left to right. So 1, ⠁, ١, ੧, and so on may be rendered the same way for items with standalone labels like this in any writing system.

For some right-to-left examples, let's say we have left-to-right "Pakistan," we may write this as:

پاکستان

in:

  • Brahui*
  • Persian
  • Uyghur*
  • Saraiki*
  • Pashto
  • Punjabi
  • Luri**
  • Kashimiri
  • Sorani/Central Kurdish
  • Balochi**
  • Azerbaijani
  • Urdu

Here * indicates a language which currently shares both Arabic-based and Latin-based scripts in one box as they have not benefited from separate codes, and ** indicates languages which seem to have boxes only for a specific dialect that likely do not indicate anything dialect specific in the absence of a main label for the language or other dialect codes. The languages that have a different label here are Arabic, Malay, Sindhi, and Mazanderani. There are several missing languages from the termbox labels which could use the same label when/if added. پ is the letter that prevents Arabic from sharing a label here - for strings which only use characters shared among a greater set of languages, the list would be longer. This could be done for basically any place name or person's name in South Asia for example, and the additional advantage of having a right-to-left label like this is that there would be at least something to display that is legible within various languages which Wikidata does not support yet. Until there is a code for Khowar, we would be able to put the name for a town where most people speak Khowar in the mul right-to-left box in the mean time.

Manuel updated the task description. (Show Details)