Page MenuHomePhabricator

Implement collation-aware sorting
Open, MediumPublic

Description

e.g. https://cs.wikinews.org/w/index.php?title=Vy%C5%A1lo_t%C5%99et%C3%AD_vyd%C3%A1n%C3%AD_%C4%8Cesk%C3%A9ho_etymologick%C3%A9ho_slovn%C3%ADku&action=edit&section=2 should be sorted using the cs collation, so the correct sort order is

[[Kategorie:Česko]]
[[Kategorie:Čeština]]
[[Kategorie:Jazykověda]]
[[Kategorie:Jiří Rejzek]]
[[Kategorie:Knihy]]
[[Kategorie:Kultura]]
[[Kategorie:Věda a technika]]

which is not what you get with a naïve non-collation-aware sort (i.e. using unicode codepoint order).

Event Timeline

valhallasw raised the priority of this task from to Needs Triage.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Pywikibot.
valhallasw added subscribers: Danny_B, Unknown Object (MLST), valhallasw, Aklapper.

See https://cs.wiktionary.org/wiki/Modul:Collation for the cs collation, but we should probably use pyicu or something like that.

Rough implementation on the sorting end:

>>> import icu
>>> locale = icu.Locale("cs_CZ")
>>> locale.getDisplayName()
u'Czech (Czech Republic)'
>>> collator = icu.Collator.createInstance(locale)
>>> list = """[[Kategorie:Česko]]
... [[Kategorie:Čeština]]
... [[Kategorie:Jazykověda]]
... [[Kategorie:Jiří Rejzek]]
... [[Kategorie:Knihy]]
... [[Kategorie:Kultura]]
... [[Kategorie:Věda a technika]]""".split("\n")
>>> print ', '.join(sorted(list))
[[Kategorie:Jazykověda]], [[Kategorie:Jiří Rejzek]], [[Kategorie:Knihy]], [[Kategorie:Kultura]], [[Kategorie:Věda a technika]], [[Kategorie:Česko]], [[Kategorie:Čeština]]
>>> print ', '.join(sorted(list, key=collator.getSortKey))
[[Kategorie:Česko]], [[Kategorie:Čeština]], [[Kategorie:Jazykověda]], [[Kategorie:Jiří Rejzek]], [[Kategorie:Knihy]], [[Kategorie:Kultura]], [[Kategorie:Věda a technika]]

But I'm confused by this one as well. As far as I can see, category.py doesn't sort at all...? it just adds sort keys for MW, as far as I can see.

Danny_B triaged this task as Medium priority.Jan 22 2016, 3:41 PM
Danny_B renamed this task from category.py: implement collation-aware sorting to Implement collation-aware sorting.Jun 3 2016, 2:01 AM
Danny_B added a project: Pywikibot-category.py.

Note that some sites can have special policies for the order of categories (e.g. in biographies), where this could not be applicable.