Page MenuHomePhabricator

Implement collation-aware sorting
Closed, DeclinedPublic

Description

e.g. https://cs.wikinews.org/w/index.php?title=Vy%C5%A1lo_t%C5%99et%C3%AD_vyd%C3%A1n%C3%AD_%C4%8Cesk%C3%A9ho_etymologick%C3%A9ho_slovn%C3%ADku&action=edit&section=2 should be sorted using the cs collation, so the correct sort order is

[[Kategorie:Česko]]
[[Kategorie:Čeština]]
[[Kategorie:Jazykověda]]
[[Kategorie:Jiří Rejzek]]
[[Kategorie:Knihy]]
[[Kategorie:Kultura]]
[[Kategorie:Věda a technika]]

which is not what you get with a naïve non-collation-aware sort (i.e. using unicode codepoint order).

Event Timeline

valhallasw raised the priority of this task from to Needs Triage.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Pywikibot.
valhallasw added subscribers: Danny_B, Unknown Object (MLST), valhallasw, Aklapper.

See https://cs.wiktionary.org/wiki/Modul:Collation for the cs collation, but we should probably use pyicu or something like that.

Rough implementation on the sorting end:

>>> import icu
>>> locale = icu.Locale("cs_CZ")
>>> locale.getDisplayName()
u'Czech (Czech Republic)'
>>> collator = icu.Collator.createInstance(locale)
>>> list = """[[Kategorie:Česko]]
... [[Kategorie:Čeština]]
... [[Kategorie:Jazykověda]]
... [[Kategorie:Jiří Rejzek]]
... [[Kategorie:Knihy]]
... [[Kategorie:Kultura]]
... [[Kategorie:Věda a technika]]""".split("\n")
>>> print ', '.join(sorted(list))
[[Kategorie:Jazykověda]], [[Kategorie:Jiří Rejzek]], [[Kategorie:Knihy]], [[Kategorie:Kultura]], [[Kategorie:Věda a technika]], [[Kategorie:Česko]], [[Kategorie:Čeština]]
>>> print ', '.join(sorted(list, key=collator.getSortKey))
[[Kategorie:Česko]], [[Kategorie:Čeština]], [[Kategorie:Jazykověda]], [[Kategorie:Jiří Rejzek]], [[Kategorie:Knihy]], [[Kategorie:Kultura]], [[Kategorie:Věda a technika]]

But I'm confused by this one as well. As far as I can see, category.py doesn't sort at all...? it just adds sort keys for MW, as far as I can see.

Danny_B triaged this task as Medium priority.Jan 22 2016, 3:41 PM
Danny_B renamed this task from category.py: implement collation-aware sorting to Implement collation-aware sorting.Jun 3 2016, 2:01 AM
Danny_B added a project: Pywikibot-category.py.

Note that some sites can have special policies for the order of categories (e.g. in biographies), where this could not be applicable.

Xqt subscribed.

The sorting order ist not clearly defined