Page MenuHomePhabricator

Add Uyghur support to LanguageConverter
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where): I would like to implement the mediawiki language converter https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter in Uyghur for the Uyghur Wikipedia. Currently, the wiki is written manually in three scripts: Arabic, Cyrillic, and Latin. This feature would allow editors to edit in any of the three scripts and be able to contribute to the Wikipedia, in line with community plans for implementation https://meta.wikimedia.org/wiki/Wikipedias_in_multiple_writing_systems#Uyghur. A list of conversions between can be found at https://en.wikipedia.org/wiki/Uyghur_alphabets#Present_situation and the conversions are (almost entirely) one-to-one between the major scripts.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
When looking at several articles on the Uyghur Wikipedia, such as that of Ataturk, there are manual links https://ug.wikipedia.org/wiki/%D9%85%DB%87%D8%B3%D8%AA%D8%A7%D9%BE%D8%A7_%D9%83%D8%A7%D9%85%D8%A7%D9%84_%D8%A6%D8%A7%D8%AA%D8%A7%D8%AA%DB%88%D8%B1%D9%83 to articles at the top of the page to versions of the same article written in different scripts. I discovered this when patrolling locally uploaded images for copyright violations and noticed that several images were used on several pages that discussed the same topic but were written in different scripts. The underlying problem is that these manual links will sometimes break, that Wikidata will only allow links to one script's version of an article, and that maintenance needs to be duplicated across different scripts that write about the same topic.

Benefits (why should this be implemented?): Implementation would allow Uyghur Wikipedia to have a single article that can be maintained by editors who are familiar with any of the major Uyghur scripts. The Uyghur Wikipedia currently contains a number of articles that are written in duplicates, requiring users to manually maintain articles in each of the three major Uyghur scripts (Arabic, Cyrilic, and Latin), some of which the users will lack fluency in. Because this is a small wiki, reducing duplication will assist in allowing our very valuable editors to focus on creating new coverage in Uyghur rather than making redundant changes across different scripts.

Event Timeline

Legoktm renamed this task from Add Uyghur to mediawiki/includes/language/converters to Add Uyghur support to LanguageConverter.Oct 3 2022, 2:31 AM

I looked into this about half a year ago and, from what I remember, there were two issues I came across: The main script is Arabic, which doesn't have capitalisation, so can't be properly converted into Latin or Cyrillic, and there are at least five different systems for writing Uyghur in Latin (ULY, UYY, ALA-LC, UNGEGN, KNAB) and the orthography used in the Uyghur Wikipedia doesn't correspond to any of them.

It's my understanding that much of Latin Uyghur doesn't actually have a capitalization convention; the local [[قېلىپ:Welcome]] does not use any capitalizations for example. There are also not a terribly long number of articles on UgWiki, so manually sorting and classifying the article based upon alphabet is not going to be a terribly hard task (especially if we have a script that can find unique characters to each set, it should not be too too hard to make a list).

Currently there's a similar system on kkwiki. Maybe it's a good choice to apply the same tool onto ugwiki. Maybe I can do this. Though I'm not fluent in Uyghur, I know about its parallel writing systems.