Page MenuHomePhabricator

Unexpected Category sort on French Wiktionary
Closed, InvalidPublic

Description

From https://www.mediawiki.org/wiki/Topic:So5yno6j2ak27ees:

On the French Wiktionary, I notice occasional odd sort glitches. For instance, http://fr.wiktionary.org/w/index.php?title=Cat%C3%A9gorie:Noms_communs&subcatfrom=bok%0ANoms+communs+en+bok#mw-subcategories will show "Noms communs en bokobaru‎", "Noms communs en boko‎", "Noms communs en bokyi‎" despite the fact that the first two sub-categories are indexed with Catégorie:Noms communs|bokobaru‏‎ and Catégorie:Noms communs|boko, respectively. The few times this happens, it is always the case that the out-of-sequence sort key is a prolongation of the key it should go after.

Another particularly bad example is http://fr.wiktionary.org/w/index.php?title=Cat%C3%A9gorie:Noms_communs&subcatfrom=dan%0ANoms+communs+en+dan#mw-subcategories where we have danaru‎, dangaléat, dani de l’Ouest, dani de Mid Grand Valley, danois, dano, and only then dan (so we have dan moved down six slots, and dano moved down one slot).

Event Timeline

MarkAHershberger raised the priority of this task from to Needs Triage.
MarkAHershberger updated the task description. (Show Details)
MarkAHershberger subscribed.
Aklapper renamed this task from Odd sort problems on frwiki to Unexpected Category sort on French Wiktionary.Sep 3 2015, 12:10 PM
Aklapper added a project: MediaWiki-Categories.
Aklapper set Security to None.

No, this can't be related to T88088, frwiktionary is not using an UCA collation.

The category sort keys on the affected pages have invisible Unicode control characters at the end ('LEFT-TO-RIGHT MARK' (U+200E) or 'RIGHT-TO-LEFT MARK' (U+200F) or both). This affects the ordering, as expected when using the 'uppercase' collation. Removing them will fix the problem. I did it with a few pages: https://fr.wiktionary.org/wiki/Spécial:Contributions/Matma_Rex and both of the examples given are now ordered correctly.

Here's the list of potentially affected 6081 pages (based on the latest dump), if anyone wants to run a bot to fix them: