Sorry, it will be Wednesday afternoon. I haven't got enough time today.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 10 2018
Apr 9 2018
I will review it by tomorrow afternoon (UTC +3).
Is it OK, that all the exeptions are listed as 'кириллица' => 'latin' even those, that are actually used in Latin => Cyrillic transliteration?
Feb 23 2018
Crimean Tatar (as well as other Turkic languages) has a plenty of affixes, that can be added to the end of a word. So, one root can produce hundreds, or may be thousands of forms, e.g. rayon, rayonda, rayonnıñ, rayonnıñki, rayonımız, rayonlar, rayonlarımız, , rayonlarımızda, rayonlarımızdaki, rayonlarımızdakiler, rayonlarımızdakilerden, rayonlarımızdakilerdensiñ, rayonlarımızdakilerdensiñmi, rayonlarımızdakilerdensizimi, rayonlarımızdakilerdenlermi, etc., etc., etc (but all of them begin with rayon- as Turkic languages has virtually no prefixes). So, it is impossible to include all forms "produced" by a single root to exeption list. That is why words from the exeption list are to be treated as patterns matching only at the beginning of a word.
The exception list is not being loaded
You mean exceptions AND regexes? If regexes are applied 99% of all texts will bi trsliterarted correctly.
Actually, most of the words you added to the exeption list (e.g. 'гонъюлли' => 'göñülli', 'дёрдю' => 'dördü', 'этюв' => 'etüv') are to be tranliterated properly without being there, because they fit one of the patterns.
Feb 9 2018
Mixed script words appeared mistakenly in some articles during early years of our Wikipedia. I hope, that there will not be any new pages with mixed script words any more, so we can let transliterator be as it is, and fix the problem in our articles.
If you can help with automatically fixing this, it will be very nice of you.
Feb 8 2018
@TJones, as far as I can see, "public $mCyrillicToLatin" and "public $mLatinToCyrillic" are implemented BEFORE all regexes and exeptions, but they are to be AFTER. At first we transliterate all exeptions, then all these sophisticated regexes and only then do "ordinary" replacements (a => а, b => б, etc.)
Could you please test how it will work if the right order is set.
Feb 7 2018
A - one of the letters b, c, g, k, p, ş
B - one of the letters ç, n, r, s, t, z
С - one of the letters b, c, ç, d, f, g, ğ, h, j, k, l, m, n, ñ, p, q, r, s, ş, t, v, y, z
D - one of the letters a, â, e, ı, i, o, ö, u, ü, а, е, ё, и, о, у, ы, э, ю, я
E - one of the letters e, i, ö, ü
_ - begining of the word
Than you for a great job! But we still have a problem: the script does not work correctly. I have opened a new ticket here https://phabricator.wikimedia.org/T186727
Aug 29 2017
Aug 25 2017
Hi!
I have reviewed the lists here https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Crimean_Tatar_Transliteration. If further help is needed I am ready to help.
In T23582#3358063, @TJones wrote:So I'm making some progress. I've managed to refactor parts of the original to work and to work more efficiently with the current framework. Progress is slow because I only work on it now and then, but I was having such a good time yesterday that I kept working on it today.
A quick question for someone familiar with Crimean Tatar. In the parallel texts I've found online, the Cyrillic text uses guillemets («x») and the Latin uses curly quotes (“x”). Should we try to convert between them, or just leave them as they are? I know that some wikis prefer straight quotes ("x"). Trying to convert straight quotes to guillemets is also possible, but would not be 100% accurate with the straightforward approach.
Really, in Cyrillic script «x» is used, bit it is possible to leave them as they are. It is not a big mistake.
Aug 19 2017
Hi! In one week I will be able to join and to help with checking if everything is ok with transliteration. No I'm on vacations and have no my computer with me.