Page MenuHomePhabricator

Syntax for explicitly omitting plural forms in CLDR-style plurals
Open, Needs TriagePublic

Description

For some languages, it is common that all the defined plural forms are only needed in exceptional cases. With our new plural validators, all forms are required to be present when CLDR style plurals are used.

Having to repeat the same form multiple times is unnecessary burden for translators. But on the other hand, missing plural forms can cause build failures.

NOTE: We do support inline plural syntax, so you can write You have $1 {{PLURAL|one=car|cars}} in a translation, even though the definition is {{PLURAL|one=You have $1 car|You have $1 cars}}. Not all translators may be aware of this.

There are two approaches to solving this problem:

  1. Automatically fill in missing forms at export time
  2. Explicit marking of omitted plural forms

Do note that (1) is not the same as with MediaWiki, since there missing forms are handled at run time. Since Ruby on Rails etc. do not have native support for inline plural forms, we would get a lot of additional changes during import, when those automatically added missing forms would now be present in import, but not in the wiki.

Hence I propose that a new syntax is added to signal that plural forms are omitted. This syntax would round-trip so that translation admins have no extra burden, and the messages inside our system stay nice and clean, and to signal that translator did not just assume there is just 2 plural forms available in translation.

Straw man proposal is the following:

{{PLURAL|one=form1|>>>}} would be equivalent to {{PLURAL|one=form1|few=form1|form1}} (language specific number of forms). As an initial implementation, plural forms must be in the canonical order, and >>> must be last item.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 24 2020, 10:26 AM

Is it really a CLDR issue or just an issue of your existing Ruby package, that has no support for selecting the correct plural form to use from language-specific rules?
It would be more convenient if your app was aware and use plural rules to select the appropriate form, instead of relying on a specific order in a fixed-sized indexed array. Ruby has support for associative arrays, or arrays conteinaing text or nil values (remap CLDR form types "default", "one","two","few","many" into an integer constant, and you're done)

Note that if you don't use associative arrays, you'll miss some other cases like forms for specific numbers (0, 1,...) which may be translated differently (independantly of generic plural rules of the language). The only required translation is the "default" one (to which generally the singular form is mapped to, but as I said the generic grammatical singular may still be specialized for 0 and 1, or fractions like 0.5 or 1/2 translated as "one half"; if you have associative arrays, the keys can be numbers like 0, 1, 1/2, or strings for generic rules like "default", "one", "many").

https://stackoverflow.com/questions/4266695/ruby-associative-arrays

So using a CLDR-like package, at least for plural rules even if you don't use the full functionalities (like the grammatical gender, or grammatic case variants), is the way to go: such package is not very complicate: you use the normal language fallback chains to select an object that contains the associative array, then you process that array as input with the given number; then if the exact number is not found as a key of the associative array, you use the language's plural rules to map the number to a plural category "one", "many"... and then fallback to the default if needed). Basically this code is no more than a dozen of lines in a single function (you already have a function to select languages from fallbacks or select the translation from the "root" locale.

abi_ added a subscriber: abi_.Jul 7 2020, 5:35 PM

With this syntax, would this be possible?

{{PLURAL|one=form1|>>>|form-many}}

This should fill all syntax between between one and many

It would theoretically possible, but my gut feeling says it won't be needed, so I deliberately left it out of this proposal. It's possible to add it later if there is need.

First, I know that >>> is only a strawman, but please don't use it because it will be terribly confusing in RTL languages.

More to the point, my hunch is that automatic things are better than using any kind of syntax. So perhaps "Automatically fill in missing forms at export time", but what does it mean exactly? That forms will be added, and then brought back into translatewiki with the cloned forms? Or that it will be done in a smart way and the cloned forms won't be seen in translatewiki?

And if this is used, will this be done for all projects, or only for those where the i18n framework cannot deal with missing forms? For MediaWiki, for example, it will probably be unnecessary.

It won't affect MediaWiki, just some other projects.

What I want to avoid is incorrect translations due to translator error or lack of knowledge about the syntax. I don't see how automatic filling would address that. I know it's status quo for MediaWiki, but I don't think it's the best solution.

Verdy_p added a comment.EditedJul 8 2020, 6:07 PM

I agree: for Mediawiki the support is accurate. For other projects using other parsers, that don't recognize the MediaWiki "template"-like syntax, such as some Ruby projects that also use poorer libraries (which need all plural forms to be specified in a strict order), the best we can do is to usggest them to integrate a better I18N library that don't have the stupid requirement for ALL plural forms to be specified when they could work with defaults.

As far as I known Ruby can support plurals with fallbacks for missing plural forms, using its own associative arrays, like they exist in Java, Javascript, C++... Then supporting the translations using the MediaWiki syntax for plural forms, or gender forms, is quite easy to do, and if needed an import tool can convert the data easily.

On Translatewiki.net, I pataches a few projects (including some games written in Python or Ruby) so that their plural markers using a syntax confusable with MediaWiki templates syntax no longer render the incorrect strings: I created a few MediaWiki formatting templates for those messages to make sure they are "exposed" visually as source code (with some coloring), to make this obvious (also to solve tons of "red links" for missing Mediawiki templates. This works for almost all.

Projects on Translate wiki net should write their plural rules (and grammar rules) using the Mediawiki syntax, even if they later convert these to their own syntax for their project.

Anyway Translatewiki.net should offer a way to expose that a source text to translate is written in a specific syntax, using its own parser: this is what is used for example for translating ".to/.pot" resources: there's a special marker before each unit that indicates which syntax it uses (C/C++, Java, Perl, PHP...). That is something that should be developed in Translatewiki.net so that it can also support the same projects as online tools for .po/.pot files (as used by the very common "gettext" library): gettext is very versatile, that's why it has tons of online translator tools for many more projects than translatewiki.net (where we cannot safely translate everything).

If this support is added, let's not diverge by adding another incompatible syntax that will just be used in borderline cases...