Page MenuHomePhabricator

Rework rup collation in IcuCollation.php
Open, Needs TriagePublic

Description

The 'rup' first letters in IcuCollation.php seem to be a combination of the Romanian collation with some additional characters. While rup has not been "officially" standardized, there has been some self-standardisation as described here, which is totally different from what we have.

This needs to be investigated both with the community and external sources.

Event Timeline

Strainu created this task.Jul 19 2017, 11:29 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 19 2017, 11:29 AM

Note that we have a Wikipedia in this language, but it's under 'roa-rup' (https://roa-rup.wikipedia.org/) for historical reasons; see T17988: Rename 'roa-rup' wikis to 'rup'.

Interestingly, there's some kind of "script switcher" at the top there, letting the reader choose between [ ăâ/ḑ/ľ/ń/ș/ț ], [ ăâ/dz/l'/n'/ș/ț ] and [ ã/dz/lj/nj/sh/ts ].

The "Ş", "Ţ" in our collation instead of "Ș", "Ț" are certainly wrong (based on the relation to Romanian and a quick look at that wiki), so let's fix that part.

Change 366316 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings

https://gerrit.wikimedia.org/r/366316

Change 366316 merged by jenkins-bot:
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings

https://gerrit.wikimedia.org/r/366316