Page MenuHomePhabricator

Implement language converter for Uyghur (ug)
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where): I would like to implement the mediawiki language converter https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter in Uyghur for the Uyghur Wikipedia. Currently, the wiki is written manually in three scripts: Arabic, Cyrillic, and Latin. This feature would allow editors to edit in any of the three scripts and be able to contribute to the Wikipedia, in line with community plans for implementation https://meta.wikimedia.org/wiki/Wikipedias_in_multiple_writing_systems#Uyghur. A list of conversions between can be found at https://en.wikipedia.org/wiki/Uyghur_alphabets#Present_situation and the conversions are (almost entirely) one-to-one between the major scripts.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
When looking at several articles on the Uyghur Wikipedia, such as that of Ataturk, there are manual links https://ug.wikipedia.org/wiki/%D9%85%DB%87%D8%B3%D8%AA%D8%A7%D9%BE%D8%A7_%D9%83%D8%A7%D9%85%D8%A7%D9%84_%D8%A6%D8%A7%D8%AA%D8%A7%D8%AA%DB%88%D8%B1%D9%83 to articles at the top of the page to versions of the same article written in different scripts. I discovered this when patrolling locally uploaded images for copyright violations and noticed that several images were used on several pages that discussed the same topic but were written in different scripts. The underlying problem is that these manual links will sometimes break, that Wikidata will only allow links to one script's version of an article, and that maintenance needs to be duplicated across different scripts that write about the same topic.

Benefits (why should this be implemented?): Implementation would allow Uyghur Wikipedia to have a single article that can be maintained by editors who are familiar with any of the major Uyghur scripts. The Uyghur Wikipedia currently contains a number of articles that are written in duplicates, requiring users to manually maintain articles in each of the three major Uyghur scripts (Arabic, Cyrilic, and Latin), some of which the users will lack fluency in. Because this is a small wiki, reducing duplication will assist in allowing our very valuable editors to focus on creating new coverage in Uyghur rather than making redundant changes across different scripts.

Details

Event Timeline

Legoktm renamed this task from Add Uyghur to mediawiki/includes/language/converters to Add Uyghur support to LanguageConverter.Oct 3 2022, 2:31 AM

I looked into this about half a year ago and, from what I remember, there were two issues I came across: The main script is Arabic, which doesn't have capitalisation, so can't be properly converted into Latin or Cyrillic, and there are at least five different systems for writing Uyghur in Latin (ULY, UYY, ALA-LC, UNGEGN, KNAB) and the orthography used in the Uyghur Wikipedia doesn't correspond to any of them.

It's my understanding that much of Latin Uyghur doesn't actually have a capitalization convention; the local [[قېلىپ:Welcome]] does not use any capitalizations for example. There are also not a terribly long number of articles on UgWiki, so manually sorting and classifying the article based upon alphabet is not going to be a terribly hard task (especially if we have a script that can find unique characters to each set, it should not be too too hard to make a list).

Currently there's a similar system on kkwiki. Maybe it's a good choice to apply the same tool onto ugwiki. Maybe I can do this. Though I'm not fluent in Uyghur, I know about its parallel writing systems.

Winston_Sung renamed this task from Add Uyghur support to LanguageConverter to Implement language converter for Uyghur (ug).Apr 8 2025, 5:52 AM

I'm almost done. All I need to finish is the title capitalisation system. I know how to grab the title, but as of right now the JavaScript isn't working to implement the system. But it seems like I'm close.

Made some scripts for capitalisation of monocase scripts in transliteration (such as UEY or Kurdish Arabic)

MediaWiki:Common.js

function chongHerp() {
	// firstHeading تۇتۇڭ
	let mezmun   = document.getElementById("mw-content-text");
	let mezmunTékst = mezmun.innerHTML
	//  ۋە -[ ۋە ]- بەلگىلىرىنى ئىشلىتىپ ماۋزۇنى چوڭ يېزىش
	let yéngiTékst = mezmunTékst.replace(/-\]((?:(?!\[-).)*?)\[-/g, (_, inner) => inner.replace(/\p{L}+/gu, txt => { return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase() }));

	// ئۈنۈملۈك ئۆزگىرىشلەرنى قوللىنىڭ
	mezmun.innerHTML = yéngiTékst;
	console.log('!!!!!!!! mezmunTékst after conversion: !!!!!!!!\n' + mezmunTékst)
}
chongHerp();

function CMawza(){
	// mawza = mw.config.get('wgPageName');
	let mawza   = document.getElementById("firstHeading");
	let mawzaTékst = mawza.textContent
	console.log('mawzaTékst: ' + mawzaTékst);
	const langConvTitle = document.getElementById("LangConvTitle");
	if ( langConvTitle ) {
	console.log('LangConvTitle: ' + langConvTitle.textContent);
	}
	const titleElem = document.getElementById("titlecaps");
	if ( titleElem ) {
	console.log('titleElem: ' + titleElem.textContent);
	titlecaps = titleElem.textContent;
	console.log('titlecaps: ' + titlecaps);
	titlecapsindex = mawzaTékst.search(titlecaps);
	if (titlecapsindex !== -1) { console.log('titlecapsindex: ' + titlecapsindex); } else { console.warn('titlecapsindex not found'); }
	}
	
	const words = titlecaps.match(/[^\s\d.,!?;:"'()<>[\]{}-]+/g);
	if (words) {
		words.forEach(word => {
			const regex = new RegExp(`(^|[\\s-])${word}(?=[\\s-]|$)`, 'giu');
			let match;
			while ((match = regex.exec(mawzaTékst)) !== null) {
				console.log(`Word: "${word}", Position: ${match.index}`);
				const bashlimak = match.index + (match[1] ? match[1].length : 0); 
				mawzaTékst = mawzaTékst.slice(0, bashlimak) + mawzaTékst.charAt(bashlimak).toUpperCase() + mawzaTékst.slice(bashlimak + 1);
				console.log(`mawzaTékst after toUpperCase for ${word} (${match.index}): ${mawzaTékst}`);
				mawza.textContent = mawzaTékst
				document.title = mawzaTékst + " - ۋىكىپېدىيە"
				// MediaWikiنىڭ ئالدىراپ يېزىلىشى بۇ يەرگەمۇ كۈندۈزى بار. ياخشى سىناقلار
				mawzaTékst = mawzaTékst.replace(/^./, c => c.toUpperCase());
			}
		});
	}
}
CMawza();

Template:CMawza

<div id="titlecaps" style="display:none">{{{1}}}</div>

Btw I really should have said this earlier, but:
I made a Gerrit change for adding the converter.
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1156871

Change #1156871 had a related patch set uploaded (by Winston Sung; author: Kxeo):

[mediawiki/core@master] Implement language converter for Uyghur (ug)

https://gerrit.wikimedia.org/r/1156871

Test wiki created on Patch demo by AntiCompositeNumber using patch(es) linked to this task:
https://a3d3a83d68.catalyst.wmcloud.org/w/