Page MenuHomePhabricator

Double character support in category pages
Closed, ResolvedPublic

Description

In some languages, certain double characters (digraphs) are treated as a sinlge
letter. For example in Hungarian, the word "cselló" starts with the double
letter "cs". (For more examples, see
[[Latin_alphabet#Collating_sequence_with_extensions]].) This means that on
huwiki category pages like [[hu:Kategória:Vonós hangszerek]], "cselló" should
not be grouped together with words starting with the letter "c", but have an own
"cs" section.

This doesn't apply to foreign words (eg. "CSS" should be put in the "c"
section), and therefore cannot be decided automatically. An easy way to handle
it would be to use a special character in the category sort keys: eg.
[[Category:XXX|cselló]] would have the same effect as now, but
[[Category:XXX|cs,elló]] would create a section for words starting with "cs" in
the category page, and put it there.

Another use for this would be a more flexible categorization of numbers; see
[[Category:Stargate_SG-1_episodes]], where "A" is used for the 10th season.
Using the above markup, a "10" section could be created by using
[[Category:Stargate_SG-1_episodes|10,{{PAGENAME}}]].


Version: unspecified
Severity: enhancement

Details

Reference
bz6928

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:17 PM
bzimport set Reference to bz6928.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

*** This bug has been marked as a duplicate of 164 ***

Reopening, this has nothing to do with collation, and - as explained above -
requires additional information beyond the category name to be handled
correctly. It cannot be handled without introducing new markup.

ayg wrote:

It has to do with nothing but collation. It requires no additional information
beyond a user-provided sort key, which would then be evaluated in a
locale-specific manner. No new markup need be added. The kind of collation
support added in bug 164 would allow things like "cs," being interpreted as its
own letter, or some better convention. Many languages have similar conventions,
many of which you kindly linked to at
[[Latin_alphabet#Collating_sequence_with_extensions]], and that's what bug 164
is about.

(For the time being, I may as well note that if you replace all "cs" with "c{s"
in sort keys, similar to what you suggest as the new markup required, it will
sort in the "c" section but after all pages starting with a normal "c", which is
at least half correct.)

Sorry. I must have misunderstood bug 164 then.

  • This bug has been marked as a duplicate of 164 ***