Page MenuHomePhabricator
Feed Advanced Search

Apr 10 2018

DonAlessandro added a comment to T188321: CRH Transliteration pattern matching fixes.

Sorry, it will be Wednesday afternoon. I haven't got enough time today.

Apr 10 2018, 1:13 PM · MediaWiki-Language-converter

Apr 9 2018

DonAlessandro added a comment to T188321: CRH Transliteration pattern matching fixes.

I will review it by tomorrow afternoon (UTC +3).
Is it OK, that all the exeptions are listed as 'кириллица' => 'latin' even those, that are actually used in Latin => Cyrillic transliteration?

Apr 9 2018, 8:48 AM · MediaWiki-Language-converter

Feb 23 2018

DonAlessandro added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

Crimean Tatar (as well as other Turkic languages) has a plenty of affixes, that can be added to the end of a word. So, one root can produce hundreds, or may be thousands of forms, e.g. rayon, rayonda, rayonnıñ, rayonnıñki, rayonımız, rayonlar, rayonlarımız, , rayonlarımızda, rayonlarımızdaki, rayonlarımızdakiler, rayonlarımızdakilerden, rayonlarımızdakilerdensiñ, rayonlarımızdakilerdensiñmi, rayonlarımızdakilerdensizimi, rayonlarımızdakilerdenlermi, etc., etc., etc (but all of them begin with rayon- as Turkic languages has virtually no prefixes). So, it is impossible to include all forms "produced" by a single root to exeption list. That is why words from the exeption list are to be treated as patterns matching only at the beginning of a word.

Feb 23 2018, 11:06 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), MediaWiki-Language-converter
DonAlessandro added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

The exception list is not being loaded

You mean exceptions AND regexes? If regexes are applied 99% of all texts will bi trsliterarted correctly.
Actually, most of the words you added to the exeption list (e.g. 'гонъюлли' => 'göñülli', 'дёрдю' => 'dördü', 'этюв' => 'etüv') are to be tranliterated properly without being there, because they fit one of the patterns.

Feb 23 2018, 4:59 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), MediaWiki-Language-converter

Feb 9 2018

DonAlessandro added a comment to T186811: Crimean Tatar Transliteration doesn't handle mixed script words.

Mixed script words appeared mistakenly in some articles during early years of our Wikipedia. I hope, that there will not be any new pages with mixed script words any more, so we can let transliterator be as it is, and fix the problem in our articles.
If you can help with automatically fixing this, it will be very nice of you.

Feb 9 2018, 4:54 PM · MediaWiki-Language-converter

Feb 8 2018

DonAlessandro added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

@TJones, as far as I can see, "public $mCyrillicToLatin" and "public $mLatinToCyrillic" are implemented BEFORE all regexes and exeptions, but they are to be AFTER. At first we transliterate all exeptions, then all these sophisticated regexes and only then do "ordinary" replacements (a => а, b => б, etc.)
Could you please test how it will work if the right order is set.

Feb 8 2018, 8:33 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), MediaWiki-Language-converter

Feb 7 2018

DonAlessandro added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

A - one of the letters b, c, g, k, p, ş
B - one of the letters ç, n, r, s, t, z
С - one of the letters b, c, ç, d, f, g, ğ, h, j, k, l, m, n, ñ, p, q, r, s, ş, t, v, y, z
D - one of the letters a, â, e, ı, i, o, ö, u, ü, а, е, ё, и, о, у, ы, э, ю, я
E - one of the letters e, i, ö, ü
_ - begining of the word

Feb 7 2018, 9:47 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), MediaWiki-Language-converter
DonAlessandro added a comment to T23582: Transliteration of Crimean Wiki.

Than you for a great job! But we still have a problem: the script does not work correctly. I have opened a new ticket here https://phabricator.wikimedia.org/T186727

Feb 7 2018, 5:44 PM · Wikimedia-Hackathon-2017, I18n, MediaWiki-Language-converter
DonAlessandro created T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.
Feb 7 2018, 4:56 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), MediaWiki-Language-converter

Aug 29 2017

DonAlessandro added a comment to T23582: Transliteration of Crimean Wiki.

Is this OK?
https://crh.wikipedia.org/w/index.php?title=Qullan%C4%B1c%C4%B1%3ADon_Alessandro%2FTranslit&type=revision&diff=132987&oldid=62397

Aug 29 2017, 10:04 AM · Wikimedia-Hackathon-2017, I18n, MediaWiki-Language-converter

Aug 25 2017

DonAlessandro added a comment to T23582: Transliteration of Crimean Wiki.

Hi!
I have reviewed the lists here https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Crimean_Tatar_Transliteration. If further help is needed I am ready to help.

Aug 25 2017, 12:42 PM · Wikimedia-Hackathon-2017, I18n, MediaWiki-Language-converter
DonAlessandro added a comment to T23582: Transliteration of Crimean Wiki.

So I'm making some progress. I've managed to refactor parts of the original to work and to work more efficiently with the current framework. Progress is slow because I only work on it now and then, but I was having such a good time yesterday that I kept working on it today.

A quick question for someone familiar with Crimean Tatar. In the parallel texts I've found online, the Cyrillic text uses guillemets («x») and the Latin uses curly quotes (“x”). Should we try to convert between them, or just leave them as they are? I know that some wikis prefer straight quotes ("x"). Trying to convert straight quotes to guillemets is also possible, but would not be 100% accurate with the straightforward approach.

Really, in Cyrillic script «x» is used, bit it is possible to leave them as they are. It is not a big mistake.

Aug 25 2017, 11:33 AM · Wikimedia-Hackathon-2017, I18n, MediaWiki-Language-converter

Aug 19 2017

DonAlessandro added a comment to T23582: Transliteration of Crimean Wiki.

Hi! In one week I will be able to join and to help with checking if everything is ok with transliteration. No I'm on vacations and have no my computer with me.

Aug 19 2017, 10:42 AM · Wikimedia-Hackathon-2017, I18n, MediaWiki-Language-converter