Maniphest T200623

cl_sortkey_prefix crops unicode string mid character
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Lokal_Profil
	Jul 28 2018, 10:10 PM

Description

Per the mw.org manual cl_sortkey_prefix is supposed to be "the human readable version of cl_sortkey" (when the dafault sort key is not used).

For very long non-latin sort keys (e.g. قصر البارون امبان بمصر الجديدة.jpg) the sortkey (for Category:Cultural heritage monuments in Egypt with known IDs) gets cropped to fit in the table, however it appears that this cropping does not respect the encoding of the string meaning the cropping stops mid unicode character. As a result the cl_sortkey_prefix cannot be converted back to unicode and cannot be said to be human readable.

The desired result would be for the crop to be encoding aware and drop that last partial cahracter.

Details

	Subject	Repo	Branch	Lines +/-
	Use multibyte-aware truncation to avoid invalid UTF-8 in cl_sortkey_prefix	mediawiki/core	master	+3 -8

Customize query in gerrit

Related Objects

Mentioned In: T200325: Handle encoding of sort keys
Mentioned Here: T155529: Get rid of UTF-8 encoded as latin-1
T200325: Handle encoding of sort keys

Event Timeline

Lokal_Profil created this task.Jul 28 2018, 10:10 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 28 2018, 10:10 PM

This arose from T200325.

This has potentially been touched on in T155529 which at least suggests that the mw.org manual is incorrect in promising that cl_sortkey_prefix should be human readable.

Lokal_Profil updated the task description. (Show Details)Jul 30 2018, 4:03 PM

Change 449280 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] Use multibyte-aware truncation to avoid invalid UTF-8 in cl_sortkey_prefix

https://gerrit.wikimedia.org/r/449280

gerritbot added a project: Patch-For-Review.Jul 30 2018, 7:52 PM

matmarex claimed this task.Jul 30 2018, 7:53 PM

matmarex triaged this task as Low priority.

matmarex edited projects, added MediaWiki-Categories; removed MediaWiki-General.

matmarex mentioned this in T200325: Handle encoding of sort keys.Jul 30 2018, 7:58 PM

Change 449280 merged by jenkins-bot:
[mediawiki/core@master] Use multibyte-aware truncation to avoid invalid UTF-8 in cl_sortkey_prefix

https://gerrit.wikimedia.org/r/449280

ReleaseTaggerBot added a project: MW-1.32-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)).Jul 31 2018, 5:00 AM

Note that this also happens for [[https://www.mediawiki.org/wiki/Manual:Categorylinks_table#cl_sortkey|cl.sortkey]] but I didn't raise that originally since the manual mentions that it may or may not be readable by a human.

matmarex closed this task as Resolved.Aug 7 2018, 11:01 PM

cl_sortkey_prefix crops unicode string mid characterClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

cl_sortkey_prefix crops unicode string mid character
Closed, ResolvedPublic
Actions