Page MenuHomePhabricator

Empty or correct 'mo' collation in IcuCollation.php
Open, Needs TriagePublic

Description

The current incarnation of the mo.wiki is written in Cyrillic, not latin, thus the current collation in IcuCollation.php is wrong. If the idea is to replicate the Romanian collation, then diacritics with cedilla-below ("Ş", "Ţ") need to be replaced with comma-below ("Ș", "Ț")

Event Timeline

Strainu created this task.Jul 19 2017, 11:42 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 19 2017, 11:42 AM

Note: the similar change for 'ro' is tracked in T168711

Change 366316 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings

https://gerrit.wikimedia.org/r/366316

Given that:

  • MediaWiki's 'mo' code seems to refer to Moldovan in Cyrillic script
  • Moldovan seems to be the same language as Romanian (even the 'mo' ISO 639 code is deprecated in favor of 'ro' for Romanian)
  • Our 'uca-mo' collation is not actually usable (per the comment 'mo' is "not in libicu", so we can't actually order the page titles, we only know what the headings would be)

…I suggest that we should just delete this entry.

Change 366316 merged by jenkins-bot:
[mediawiki/core@master] IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings

https://gerrit.wikimedia.org/r/366316

…I suggest that we should just delete this entry.

I was going to suggest that when I submitted the bug, but I've noticed that this commit actually adds more languages that do not exist in libicu, so I suspect there is a good reason for that. If I were to assume based on visual code inspection, I would say that if the entry for the language does not exist, the code tries to find first-letters-{$this->locale}.ser, which presumably is generated from libicu data, causing the code to fail with "MediaWiki does not support ICU locale mo". That's why I proposed to empty the entry instead, so the default first-letters-root.ser is used.

I was going to suggest that when I submitted the bug, but I've noticed that this commit actually adds more languages that do not exist in libicu, so I suspect there is a good reason for that.

As I understand, they exist in CLDR, so they could potentially be implemented in future versions of libicu, but no one did that yet (at least not when that commit was made).

CLDR has an entry for ro_MD, in latin script. I have yet to find any difference between that and ro_RO.