Page MenuHomePhabricator

cl_sortkey_prefix crops unicode string mid character
Closed, ResolvedPublic

Description

Per the mw.org manual cl_sortkey_prefix is supposed to be "the human readable version of cl_sortkey" (when the dafault sort key is not used).

For very long non-latin sort keys (e.g. قصر البارون امبان بمصر الجديدة.jpg) the sortkey (for Category:Cultural heritage monuments in Egypt with known IDs) gets cropped to fit in the table, however it appears that this cropping does not respect the encoding of the string meaning the cropping stops mid unicode character. As a result the cl_sortkey_prefix cannot be converted back to unicode and cannot be said to be human readable.

The desired result would be for the crop to be encoding aware and drop that last partial cahracter.

Details

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 28 2018, 10:10 PM

This arose from T200325.

This has potentially been touched on in T155529 which at least suggests that the mw.org manual is incorrect in promising that cl_sortkey_prefix should be human readable.

Lokal_Profil updated the task description. (Show Details)Jul 30 2018, 4:03 PM

Change 449280 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] Use multibyte-aware truncation to avoid invalid UTF-8 in cl_sortkey_prefix

https://gerrit.wikimedia.org/r/449280

matmarex claimed this task.Jul 30 2018, 7:53 PM
matmarex triaged this task as Low priority.
matmarex edited projects, added MediaWiki-Categories; removed MediaWiki-General.

Change 449280 merged by jenkins-bot:
[mediawiki/core@master] Use multibyte-aware truncation to avoid invalid UTF-8 in cl_sortkey_prefix

https://gerrit.wikimedia.org/r/449280

Lokal_Profil added a comment.EditedJul 31 2018, 6:34 AM

Note that this also happens for cl.sortkey but I didn't raise that originally since the manual mentions that it may or may not be readable by a human.

matmarex closed this task as Resolved.Aug 7 2018, 11:01 PM