Page MenuHomePhabricator

Changing the alphabetical sorting (collation) @ ro.wikipedia.org
Closed, ResolvedPublic

Description

Hi.

For 14 years now (since ro.wikipedia.org had been created on June 19 2003), the sorting of articles in categories has been not following Romanian alphabet order. Romanian alphabet has five additional letters to English (ăâîșț with their respective capital ĂÂÎȘȚ) all of which are extensively used in words. The default Unicode sorting sends these letters after letter Z in an alphabetical list. To mitigate this, we use to manipulate DEFAULTSORT for titles including Romanian diacritics, by using SȘortț or Szzortzz for an article like Șorț.

Inspired by the Bashkir Wikimedians, the community of Romanian Wikipedia has decided that we need to have the default alphabetical sorting changed to the Romanian alphabet, which is as follows:

AĂÂBCDEFGHIÎJKLMNOPQRSȘTȚUVWXYZ
aăâbcdefghiîjklmnopqrsștțuvwxyz

Note: The order has been fixed locally on ro.wikipedia.org in table sorting only by using an oldschool workaround.

Note2: This request applies for the rest of Romanian wiki-projects, including ro.wiktionary.org

Event Timeline

Change 361066 had a related patch set uploaded (by Strainu; owner: Strainu):
[operations/mediawiki-config@master] Set collation for Romanian wikis to uca-ro

https://gerrit.wikimedia.org/r/361066

@Strainu I am not at all familiar with collation.

Can't say I see why I'm mentioned here either.

Note that the proposed patch only applies to collation of categories. For sortable tables, you have to keep using the current workaround (T32674). For special pages with lists, there's no workaround (T32753).

This hasn't been explicitly discussed, but I don't see why not. Will ask and come back as soon as possible (with a patch as well)

@TheDJ : We're good on the numeric sorting, I've updated the patch accordingly

Still no-one willing to do the review?

Still no-one willing to do the review?

I'll schedule https://gerrit.wikimedia.org/r/#/c/361066/ for deployment for EU SWAT (July 17, 13:00-14:00 UTC). This is usually done by the patch's author but this is a relatively easy one so I can do it instead of them. For config patches, there are no "automatic reviews", scheduling must be done :).

Urbanecm triaged this task as Medium priority.Jul 14 2017, 7:45 PM
Urbanecm added a project: User-Urbanecm.

Change 361066 merged by jenkins-bot:
[operations/mediawiki-config@master] Set collation for Romanian wikis to uca-ro-u-kn

https://gerrit.wikimedia.org/r/361066

Mentioned in SAL (#wikimedia-operations) [2017-07-17T15:21:14Z] <zfilipin@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:361066|Set collation for Romanian wikis to uca-ro-u-kn (T168711)]] (duration: 00m 47s)

Urbanecm added a subscriber: zeljkofilipin.

Deployed. The updateCollation.php script is being run by @zeljkofilipin.

@Urbanecm, thanks for the help, you really moved this forward.

I must say I am profoundly confused by how SWAT works. Since the same person who did the +2 and push previously said in this bug that he's not familiar with collation, that means that the patch was basically pushed unreviewed (or rather, on the dev's own assumption that he made no mistake), which sounds weird.

You're welcome, I'm glad I could help you. If you have specific questions about SWAT, feel free to ask me, I'll be happy to help you.

Strainu reopened this task as Open.EditedJul 19 2017, 11:13 AM
Strainu added a subscriber: Bawolff.

Unfortunately it would seem the diacritics used in the letter headings are wrong. They should be "Ș", "Ț" (comma-below) and instead they are cedilla-below (Ş, Ţ). See for example https://ro.wikipedia.org/wiki/Categorie:%C3%8Embr%C4%83c%C4%83minte

@matmarex , @Bawolff: I suspect the issue comes from mediawiki/includes/collation/IcuCollation.php. Could you confirm that, since you have worked on that file recently? If so, could someone possibly take the time to change the diacritics in line 199? I don't have access to my dev environment for the next few weeks. Thank you!

Whoops, that's embarrassing. I'll fix it. Sorry!

Change 366261 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Romanian (ro) headings

https://gerrit.wikimedia.org/r/366261

Change 366261 merged by jenkins-bot:
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Romanian (ro) headings

https://gerrit.wikimedia.org/r/366261

matmarex removed a project: Patch-For-Review.

Deployed. Looks good to me!