Page MenuHomePhabricator

Avoid separate first-letter sections of Pinyin and English words in large categories for Pinyin collation
Open, Needs TriagePublic

Description

ICU orders characters by script group (Latin/Hani) first, but we want to avoid having first-letter sections separately for Pinyin and English words on different pages of a large category.

image.png (629×720 px, 61 KB)
image.png (719×719 px, 55 KB)

Event Timeline

Change #1241256 had a related patch set uploaded (by Func; author: Func):

[mediawiki/core@master] IcuCollation: Group Pinyin initials and Latin characters into buckets

https://gerrit.wikimedia.org/r/1241256

How will manual pinyin overrides (if that exists) for characters not pronouncing as usual behave under this patch?

How will manual pinyin overrides (if that exists) for characters not pronouncing as usual behave under this patch?

This patch did not change the behaviour for manual overrides, so they would be sorted as English words as previously. The issue will be investigated and improved as part of T401456, or you may file a subtask specifically for in-page manual overrides. I have some WIP on T401456, may post my findings in a few days.

Change #1241256 merged by jenkins-bot:

[mediawiki/core@master] Collation: Introduce a tailored collation for Chinese Pinyin sorting

https://gerrit.wikimedia.org/r/1241256