Page MenuHomePhabricator

Implement Tatar language LanguageConverter
Open, MediumPublic

Description

tt converter classes made from kazakh classes by replacing kk to tt and adding some letters

this is code i have made from kazakh converter replacing kk to tt etc.

(i have made this several months ago, but has not worked further since then).

i will attach 6 files, 3 of them in messages folder, 3 are in classes folder. and a readme file is in attachment.

(and i have added some letters, that are not in kazakh language).


Version: unspecified
Severity: enhancement

Attached:

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:19 PM
bzimport set Reference to bz25537.
bzimport added a subscriber: Unknown Object (MLST).
qdinar created this task.Oct 16 2010, 8:42 AM

Please submit these as a SVN diff against trunk.

kaldari renamed this task from imperfect but useful converter code for tatar language to imperfect but useful LanguageConverter code for tatar language.Jan 14 2015, 11:38 PM
kaldari set Security to None.
gerritbot added a subscriber: gerritbot.

Change 185090 had a related patch set uploaded (by Kaldari):
Adding LanguageConverter files for Tatar Language

https://gerrit.wikimedia.org/r/185090

Patch-For-Review

qdinar added a subscriber: qdinar.Mar 13 2015, 4:33 PM

hi . i have made a new converter and uploaded to gerrit :
https://gerrit.wikimedia.org/r/#/c/164049/

3 texts i made (tested) new converter with

Elitre added a subscriber: Elitre.Mar 29 2015, 7:31 PM

Change 185090 abandoned by Kaldari:
Adding LanguageConverter files for Tatar Language

Reason:
Replaced by change I18768eb1b13

https://gerrit.wikimedia.org/r/185090

new test

Change 164049 had a related patch set uploaded (by Nikerabbit):
Add Tatar LanguageConverter

https://gerrit.wikimedia.org/r/164049

@Arrbee, @Amire80, can review of this feature please be put on the Language Engineering team's workboard?

Reedy renamed this task from imperfect but useful LanguageConverter code for tatar language to Implement Tatar language LanguageConverter.Nov 22 2019, 3:32 PM
Reedy removed a subscriber: wikibugs-l-list.

is there community consensus for this code? there were many discussions so it must be wanted. there are links to discussions here: https://tt.wikipedia.org/wiki/Кулланучы:Qdinar#википедиядагы_сөйләшүләр . standalone version of this converter is referred at https://tt.wikipedia.org/wiki/Татар_Википедиясе#TATLAT .

direct links to the standalone version, cyr->lat and lat->cyr, applied to tt.wikipedia.org:
http://https.tt.wikipedia.org.ttcysuttlart1999.aylandirow.tmf.org.ru/wiki/Баш_бит
http://https.tt.wikipedia.org.ttlart2012ttcysu.aylandirow.tmf.org.ru/wiki/Baş_bit

i personally do not "push" this project hard, because i generally dislike how this latin and also cyrillic alphabets are designed. for example, cyrillic/latin letter e is used for "i/e" sound, while there is also real "e" sound in words like "electron". it makes confusions with european languages and with turkish language. i am a programmer here, and wikipedians decided to use some authoritative alphabet, like all wikipedia is made, with authoritative sources, so i programmed using some governmental latin projects.

comment from code, i am going to mostly delete this from the code:
2017-02-18, author dinar qurbanov: by making this converter, i look like supporting it. but it is not so. *i think this alphabet has many disadvantages, i do not want to make it popular.* i regard this as historical museum showpiece. i think it should be ok to put it into tatar wikipedia, into conversion system of mediawiki. that converted pages are denied for search engines to index, as i know. exact version of latin orthography (and alphabet) was not chosen by voting by wikipedians, and wikipedians have not voted to edit rules of the tatar latin orthography to be used in wikipedia, so, i have decided to make this exactly as it was commanded by 2000's #882 resolution of cabinet of ministers of tatarstan. i use scans published by user Kitap ( https://tt.wikipedia.org/wiki/Татарстанда_татар_телен_дәүләт_теле_буларак_куллану_кануны#Татар_теленең_латин_язулы_орфографиясенең_гамәлдән_чыккан,_хәзерге_вакытта_рәсми_булмаган_кайбер_кагыйдәләре ), but i am not sure whether they are of resolution #882 or #618. that 2000's #882 resolution is canceled by russia law and by resolution #38 of 2013, of cabinet of ministers of republic of tatarstan, and new alphabet is accepted by 2013's law of tatarstan 1-ЗРТ, but that new alphabet is (even) less usable: there is no rules, no character for palatilasation in russian words, and the alphabets' table does not show all use cases of cyrillic letters. and i am going to mark this script as tt-latn-2000. i have found from gerrit comment that it is not ok. ("2000" subtag of variant is not registered in iana yet, but must, see https://en.wikipedia.org/wiki/IETF_language_tag ). then maybe i will mark as tt-latn-x-2000 where it is not variant, but in private-use subtag.

renamed 2000 to 2013, because wikipedians would not like it is named as 2000, because 2000's laws are canceled, but now there is 2013's law. there are several letter differences like ɵ -> ö, though ö was also somewhat admitted for computer usage. this converter uses ö. and there is no letter for hamza and palatalisation in 2013's law, and no rules/orthography are given. this converter uses apostroph for hamza and palatalisation, as used in 2000 law, and rules/orthography as given in 2000 law.

qdinar updated the task description. (Show Details)Dec 11 2019, 2:12 PM

converter is ready long time ago. the code is not accepted into mediawiki. @thiemowmde voted -1 and requested to separate code into more files.

2019-11-25, Thiemo Kreuz:

... the maintenance costs for a monolith like this are unbearable ...
This code needs to be split into small services other human beings not keeping track of this for the past 5 (!) years are able to grasp, understand, and feel responsible for.
This possibly needs to be a separate extension.

2019-7-5, Thiemo Kreuz:

... Why was it not possible to split this up into smaller patches that have been merged years ago?
...
Was it really necessary to pack 3500 lines of code into a single file? If there is one mistake MediaWiki core code suffers from then it's this: unmaintainable classes with to much code in one place.
Please, please split this in multiple ways: multiple smaller classes, each introduced in a separate patch, each covered by a separate set of test cases. Ideally all this code is created first as part of a separate library in a separate Git repository. You can use GitHub or ask for one here on Gerrit. If this library is solid, well tested and reviewed, it will be much easier to add and use it in MediaWiki core.

2019-11-25, i answered:

i think it is possible to separate tatar language gramar functions into separate class, but that would be almost useless class, by itself, at this stage. useless because, for example, it works only with "thick" sounded suffixes (with a, o etc), and not with "thin" (ä, ö), because only thicks maybe confused with russian words (words borrowed from russian language), and thus work is only needed to "thick" words.

once i have thought about putting regex replace strings into arrays, but have forgotten about that. maybe i will make that soon.

you can see more comments at https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/164049/ .

why i am stalled since 2019-12-11, when was my last comment there?

i feel reluctant/lazy to do these things, ie coding this code, to separate it into more files etc, independently of suspicion problem, that i describe below.

i had problems with suspicion/paranoia that my operation system is hacked i reinstalled different oses several times, nearly since october. also now i am afraid simple installing of mediawiki developer version onto ubuntu operation system i use now is also not very trustable, so i would like to use some virtual machine, that also slows me down, makes me lazy. this does not mean that i especially distrust mediawiki developer version nor that i suspected it in previous cases. i do not also {trust ubuntu repository very much}, and different packages in it like firefox and others. also i feel distrust to firefox extensions that i use.