Turkish needs lc / uc methods
Open, LowPublic
Actions

Assigned To

None

Authored By

	hashar
	Jan 10 2012, 8:23 PM

Description

Split from bug 31490.

Our Turkish language class lacks proper implementation of lc() and uc() methods for that language. It uses a dotted i and a dotless i, which mean that I and i are actually different letter in that language!

Useful context to read is https://en.wikipedia.org/wiki/Dotted_and_dotless_I

An implementation was deployed on wmf wiki for MediaWiki 1.18 but it was reverted by r99289 and r99290. The reason is that the patches broke magic words and related parser functions (i.e. {{#lcfirst}}) on the turkish wikis.

The MediaWiki code handling magic words normalize the wordsto lower case using the content language (look for lc() calls in the MagicWord class). Hence a magic word such as LCFIRST is treated just like any Turkish word (since we use content language) and it ends up lower cased but with a dotted i and the word is not found.

Two possibilities:

magic words could optionally be made an array referencing the language. Then we could use that language to use the proper lc / uc implementations
for Turkish language, forge magic words aliases having dotted or dot less i. i.e. 'ucfirst' (with dot) could have an alias UCFIRST (without dot). Both would then be valid.

Optionally, parser functions could use a parameter to change the language being used. This would let Turkish project to use the English lc / uc function, for example to upper case iPhone to IPhone (dotless i).

Details

Reference: bz33643

Related Objects

Mentioned In: T155993: Interlanguage links in Turkish and Azerbaijani show a capital dotted İ
T46495: Implement lc, lcfirst, uc, and ucfirst magic words in jqueryMsg
Mentioned Here: T32759: Languages that need a LanguageConverter implementation (tracking)

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 12:06 AM

• bzimport added a project: MediaWiki-Internationalization.

• bzimport set Reference to bz33643.

• bzimport added a subscriber: Unknown Object (MLST).

hashar created this task.Jan 10 2012, 8:23 PM

Bug 33299 has been marked as a duplicate of this bug. ***

Bug 32707 has been marked as a duplicate of this bug. ***

Bug 40012 has been marked as a duplicate of this bug. ***

This is still an ongoing issue though I am not working in it myself.

vitomedia wrote:

The issue about the system messages at TR projects is quite annoying (see Bug 40012), so it'd be fantastic if it could be worked out.

matmarex mentioned this in T46495: Implement lc, lcfirst, uc, and ucfirst magic words in jqueryMsg.Dec 21 2015, 5:05 PM

hashar merged a task: T42012: LC/UC problem at tr.wiki.Nov 24 2016, 1:01 PM

hashar merged tasks: T34707: Shows wrong lower case letter in the Turkish wikipedia, T35299: Lowercase of I is i, not ı.

hashar added subscribers: • bzimport, Krenair.

hashar added subscribers: DanielFriesen, MarkAHershberger.

wondering if this is a part of T32759 or not?

hashar updated the task description. (Show Details)Jun 10 2021, 12:36 PM

hashar updated the task description. (Show Details)

hashar updated the task description. (Show Details)Jun 10 2021, 12:43 PM

Hi @Arrbee, I am triaging some old bugs of mine. I believe this one will fit the language team pretty well but might involve people from the Parsing team.

The summary is that Turkish has two characters for i, one with a dot, the other without a dot: https://en.wikipedia.org/wiki/Dotted_and_dotless_I and need lc and uc methods to take that account (we already do for lcfirst and ucfirst). A few patches I wrote ages ago ended up being reverted cause lc is also used to normalizes wikitext magic word using the content language. Hence LCFIRST ended up normalized to lcfırst (with a lower case dotless i) which is not in the magic word array. It broke reaching special pages as well.

So I don't quite know how to fix it, but it would be nice to have someone knowing with the internal of our languages to dig into it. Possibly with the help of people actually knowing Turkish :]

hashar mentioned this in T155993: Interlanguage links in Turkish and Azerbaijani show a capital dotted İ.Sep 21 2021, 7:43 AM

Aklapper updated the task description. (Show Details)Nov 5 2023, 10:41 AM

Aklapper removed subscribers: • bzimport, • wikibugs-l-list.

Turkish needs lc / uc methodsOpen, LowPublicActions

Description

Details

Related Objects

Event Timeline

Turkish needs lc / uc methods
Open, LowPublic
Actions