Page MenuHomePhabricator

Capital letters are always sorted first
Closed, ResolvedPublic

Description

Author: Drilnoth

Description:
Currently, all uppercase letters are sorted in categories before all lowercase letters. For example, in http://en.wikipedia.org/wiki/Category:Bo-Bo_locomotives , the article "VR Class Sr2" is listed before "Victorian Railways E class (electric)". This is especially problematic in categories where abbreviations such as "SSX" or "NBA" are commonly used. Logically, uppercase letters should be sorted as being the same as lowercase letters. I understand that this is caused because category sorting uses Unicode ordering, but would it be possible to (essentially) say that "A = a", to have them sort correctly?

Current guidelines on this issue at http://en.wikipedia.org/wiki/Wikipedia:Categorization#Using_sort_keys would imply that most articles should have a DEFAULTSORT key in order to fix this, but there is resistance to having DEFAULTSORTs which really shouldn't be needed.


Version: 1.16.x
Severity: enhancement

Details

Reference
bz19197

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:43 PM
bzimport set Reference to bz19197.
bzimport added a subscriber: Unknown Object (MLST).

I believe the problem is that the sortkey is sorted as binary, so capital letters will come before lowercase letters. Sorting as utf-8 would fix it, but Wikimedia is still using MySQL 4 which I don't believe supports that. Other than upgrading to MySQL 5, this could be somewhat fixed by forcing sortkeys to lower case before saving them to the database, but that would possibly break other things.

Drilnoth wrote:

Gotcha... I'm guessing that MySQL 5 would be way too big a jump at this point, right?

Is this a dupe of something? Bug 164 comes to mind.

happy.melon.wiki wrote:

It's a "sort by something other than Unicode character point" bug, so yes, I'd say so.

(In reply to comment #2)

Gotcha... I'm guessing that MySQL 5 would be way too big a jump at this point,
right?

It's in the works. It's been in the works for a while. It will probably still be in the works for a while to come :D

happy.melon.wiki wrote:

*** This bug has been marked as a duplicate of bug 164 ***