Page MenuHomePhabricator

Rework rup collation in IcuCollation.php
Open, Needs TriagePublic

Description

The 'rup' first letters in IcuCollation.php seem to be a combination of the Romanian collation with some additional characters. While rup has not been "officially" standardized, there has been some self-standardisation as described here, which is totally different from what we have.

This needs to be investigated both with the community and external sources.

Event Timeline

Note that we have a Wikipedia in this language, but it's under 'roa-rup' (https://roa-rup.wikipedia.org/) for historical reasons; see T17988: Rename 'roa-rup' wikis to 'rup'.

Interestingly, there's some kind of "script switcher" at the top there, letting the reader choose between [ ăâ/ḑ/ľ/ń/ș/ț ], [ ăâ/dz/l'/n'/ș/ț ] and [ ã/dz/lj/nj/sh/ts ].

The "Ş", "Ţ" in our collation instead of "Ș", "Ț" are certainly wrong (based on the relation to Romanian and a quick look at that wiki), so let's fix that part.

Change 366316 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings

https://gerrit.wikimedia.org/r/366316

Change 366316 merged by jenkins-bot:
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings

https://gerrit.wikimedia.org/r/366316