Page MenuHomePhabricator

uca-tr collation lists "I" as "ı" in category pages on trwiki
Closed, ResolvedPublic

Description

After migrating to uca-tr, category pages on trwiki started to render the uppercase ı (I) in lowercase form (ı). See this page for an example.

Event Timeline

Superyetkin triaged this task as Normal priority.
Aklapper raised the priority of this task from Normal to Needs Triage.Aug 30 2018, 2:35 PM

@Superyetkin: As requested before, please do not prioritize tasks if you don't plan to work on them. Thanks!

Urbanecm moved this task from Backlog to Own projects on the User-Urbanecm board.Sep 10 2018, 6:34 PM
Urbanecm moved this task from Own projects to Backlog on the User-Urbanecm board.

What needs to be done to fix this issue? A patch?

Bawolff added a subscriber: Bawolff.EditedFeb 20 2019, 3:17 PM

Just to make sure I understand correctly (Since I don't speak Turkish).

The issue is, that on page https://tr.wikipedia.org/wiki/Kategori:Bat%C4%B1_Asya_%C3%BClkeleri the subcategory Irak‎ is currently listed under ı U+0131 "LATIN SMALL LETTER DOTLESS I" where it really should be written under I U+0049 "LATIN CAPITAL LETTER I"

So at first glance, it looks like (IcuCollation.php line 397):

// Primary collision (two characters with the same sort position).
// Keep whichever one sorts first in the main collator.
$comp = $this->mainCollator->compare( $letter, $letterMap[$key] );

Has the comparison reversed, since lowercase comes before uppercase.

So at first glance, it looks like (IcuCollation.php line 397):

// Primary collision (two characters with the same sort position).
// Keep whichever one sorts first in the main collator.
$comp = $this->mainCollator->compare( $letter, $letterMap[$key] );

Has the comparison reversed, since lowercase comes before uppercase.

However, changing this would probably affect which letter is chosen as section header in several langauges like: dsb, et, eu, fa, fi, fo, fur, hsb, kk, kl, km, ku, ky, lkt, ln, lt, lv, [I stopped checking at this point]

So probably safer to leave that as it is, and special case Turkish.

Change 491801 had a related patch set uploaded (by Brian Wolff; owner: Brian Wolff):
[mediawiki/core@master] Make uca-tr use dotless I as uppercase of dotless i

https://gerrit.wikimedia.org/r/491801

Change 491801 merged by jenkins-bot:
[mediawiki/core@master] Make uca-tr use I as uppercase of dotless ı instead of reverse

https://gerrit.wikimedia.org/r/491801

matmarex closed this task as Resolved.Feb 20 2019, 9:23 PM
matmarex assigned this task to Bawolff.

This will be fixed on Turkish Wikipedia next week, with the deployment of MW 1.33.0-wmf.19.

(No updateCollation.php run is needed; the ordering was already correct, just under the wrong heading.)