Page MenuHomePhabricator

uca-tr collation lists "I" as "ı" in category pages on trwiki
Closed, ResolvedPublic

Description

After migrating to uca-tr, category pages on trwiki started to render the uppercase ı (I) in lowercase form (ı). See this page for an example.

Event Timeline

Superyetkin triaged this task as Normal priority.Aug 30 2018, 1:35 PM
Superyetkin created this task.
Aklapper raised the priority of this task from Normal to Needs Triage.Aug 30 2018, 2:35 PM

@Superyetkin: As requested before, please do not prioritize tasks if you don't plan to work on them. Thanks!

Urbanecm moved this task from Backlog to Own projects on the User-Urbanecm board.Sep 10 2018, 6:34 PM
Urbanecm moved this task from Own projects to Backlog on the User-Urbanecm board.

What needs to be done to fix this issue? A patch?

Bawolff added a subscriber: Bawolff.EditedFeb 20 2019, 3:17 PM

Just to make sure I understand correctly (Since I don't speak Turkish).

The issue is, that on page https://tr.wikipedia.org/wiki/Kategori:Bat%C4%B1_Asya_%C3%BClkeleri the subcategory Irak‎ is currently listed under ı U+0131 "LATIN SMALL LETTER DOTLESS I" where it really should be written under I U+0049 "LATIN CAPITAL LETTER I"

So at first glance, it looks like (IcuCollation.php line 397):

// Primary collision (two characters with the same sort position).
// Keep whichever one sorts first in the main collator.
$comp = $this->mainCollator->compare( $letter, $letterMap[$key] );

Has the comparison reversed, since lowercase comes before uppercase.

So at first glance, it looks like (IcuCollation.php line 397):

// Primary collision (two characters with the same sort position).
// Keep whichever one sorts first in the main collator.
$comp = $this->mainCollator->compare( $letter, $letterMap[$key] );

Has the comparison reversed, since lowercase comes before uppercase.

However, changing this would probably affect which letter is chosen as section header in several langauges like: dsb, et, eu, fa, fi, fo, fur, hsb, kk, kl, km, ku, ky, lkt, ln, lt, lv, [I stopped checking at this point]

So probably safer to leave that as it is, and special case Turkish.

Change 491801 had a related patch set uploaded (by Brian Wolff; owner: Brian Wolff):
[mediawiki/core@master] Make uca-tr use dotless I as uppercase of dotless i

https://gerrit.wikimedia.org/r/491801

Change 491801 merged by jenkins-bot:
[mediawiki/core@master] Make uca-tr use I as uppercase of dotless ı instead of reverse

https://gerrit.wikimedia.org/r/491801

matmarex closed this task as Resolved.Feb 20 2019, 9:23 PM
matmarex assigned this task to Bawolff.

This will be fixed on Turkish Wikipedia next week, with the deployment of MW 1.33.0-wmf.19.

(No updateCollation.php run is needed; the ordering was already correct, just under the wrong heading.)