Page MenuHomePhabricator

Replace Android quantity strings with Extended/MessageFormat or similar offering
Closed, ResolvedPublic

Description

Android quantity strings don't cover pluralization for zero well[0]. We think ExtendedMessageFormat[1] or MessageFormat[2] or a similar library may work better. This task covers the work of choosing a library, adding it to the project, and if possible, salvaging the existing translations to the new format.

It's tempting to try to write this logic ourselves, but consider the following. Android defines these quantities: zero, one, two, few, many, and other. As an alternative to quantity strings, we could write these out as simple strings individually. The English case looks like:

view_continue_reading_card_subtitle_zero = "today"
view_continue_reading_card_subtitle_one = "yesterday"
view_continue_reading_card_subtitle_other = "%d days ago"

But, assuming Android got it right, this isn't enough for all languages so actually it also has*:

view_continue_reading_card_subtitle_two = @string/view_continue_reading_card_subtitle_other
view_continue_reading_card_subtitle_few = @string/view_continue_reading_card_subtitle_other
view_continue_reading_card_subtitle_many = @string/view_continue_reading_card_subtitle_other

*I don't think translatewiki supports Android string references so these would be copied and pasted which is ok but will need to be done for every language (no defaults for plurals).

But we don't know what few or many in other languages so we also need:

quantity_few = 3
quantity_many = 6

Then I guess we write a switch:

@StringRes int getQuantityStringRes(@IntRange(from=0) int quantity) {
  if (quantity == 0) {
    return view_continue_reading_card_subtitle_zero
  }
  if (quantity == 1) {
    return view_continue_reading_card_subtitle_one
  }
  if (quantity == 2) {
    return view_continue_reading_card_subtitle_two
  }
  int few = res.getInt(R.int.quantity_few)
  if (quantity <= few) {
    return view_continue_reading_card_subtitle_few
  }
  int many = res.getInt(R.int.quantity_many)
  if (quantity <= many) {
    return view_continue_reading_card_subtitle_many
  }
  return view_continue_reading_card_subtitle_other
}

And use it like so:

String msg = getString(getQuantityStringRes(Math.abs(quantity)), quantity)

This seems like it would work but imagine all the corner cases it probably misses. I think we'd be better forking the platform code as a library and adding logic to use the zero case when the quantity was zero and a zero string existed but maybe this breaks some other language? This really has to be someone else's (some other library's) problem :]

[0] https://developer.android.com/guide/topics/resources/string-resource.html#Plurals
[1] https://commons.apache.org/proper/commons-lang/javadocs/api-3.5/org/apache/commons/lang3/text/ExtendedMessageFormat.html
[2] https://stackoverflow.com/a/5671704

Event Timeline

IIRC, translatewiki actually supports Andorid plurals and even makes them look like MediaWiki plurals, with {{PLURAL}}. @Nikerabbit and @yuvipanda, who worked on it a couple of years ago, may add details.

Ideally, the app would:

  • support all the CLDR plural keywords for languages that need them.
  • support particular numbers (this is useful for cases when 0 is special, although in MediaWiki this is occasionally abused).
  • work in translatewiki as similarly as possible to MediaWiki and extensions.

@Amire80, that's right. The string references I'm referring to in this ticket are Android specific. For example, bar below:

<?xml version="1.0" encoding="utf-8"?>
<resources>
  <string name="foo">abc</string>
  <string name="bar">@string/foo</string>
</resources>

AFAIK, translatewiki and Android both support all CLDR keywords. The 0 case is inflexible in the Android platform implementation. I'm not sure if this is good or bad, unusual or not, but it doesn't work well for one case in the app.

The string "zero" should only work in languages for which CLDR defines it. If it doesn't work in English, this is OK, because CLDR doesn't define "zero" forms for English.

It may make sense to have a way to define values for particular numbers, however. It's possible in MediaWiki core by specifying numbers, like, "0". We can live without this, however—if some languages require it consistently, then it should be in CLDR anyway, and if it's a particular case in the code that applies to all languages, then it can be solved in the code (as in https://gerrit.wikimedia.org/r/#/c/316913/ ). There were some discussions about this back in 2012 or so; @Nikerabbit may recall some details that I don't remember now.

@Amire80, I'm not as experienced with localization as I should be. It sounds like the way the app is doing things now is correct and this task should be closed.

Comment about this task description: The only special case we need on top of what Android provides is "zero". I think the Android system handles all other quantities by default, so we wouldn't have to include them in our special switch.

I wouldn't add "zero" as a special case to languages for which CLDR doesn't specify them.

It was my mistake to do it in the first patchset of https://gerrit.wikimedia.org/r/#/c/316913/ . English doesn't have zero. Adding an option to have numbers, as I suggested in a comment above, would be nice, but not a high priority.

@Amire80, please be patient with me. I'm still unsure if the current implementation is adequate and we should close this task or if some change is desirable. I think these language problems have been solved but I haven't studied their solutions so I'm coming with a question, not a novel proposal.

It seems to me we should always allow for a special case zero in all languages, including English. We saw this recently in wanting to change "0 days" to "today". Another example in English would be "0 pennies" to "penniless". There are probably many others.

English exceptions are obvious to me and I could make special case patches similar to yours whenever identified but these seem like spot fixes. I'm worried about this approach because if the problem didn't manifest in English, I wouldn't notice it in other languages.

I think that adding a special zero case specification to all languages, that is allow zero to prioritize over other when defined, would provide a general solution but I don't know why CLDR doesn't define it. It sounds like Android's pluralization is just abiding by CLDR best practices. To me, this task is about fixing the zero case in a general way.

If you look at http://www.unicode.org/cldr/charts/30/supplemental/language_plural_rules.html#lv for example, zero does not equal n = 0. For this reason, I would recommend following the MediaWiki practice of using number 0 for overrides.

@Nikerabbit, thanks. So if I understand correctly, we should always allow a translator to specify a strict n = 0 value for pluralized resources. BTW, does TWN happen to support the Java MessageFormat?

As far as I can see, no special support is required from us to support MessageFormat if it is just placeholders in the string. Optionally syntax checkers and insertables can be easily written.

In general, the MessageFormat syntax is way too verbose and complicated for generic plural support, so it should not be used for that.

It sounds like this task is mostly having to do with "relative-time" strings (e.g. "today", "n days ago", "last year", etc.) which can indeed be maddeningly complex in different languages, since the grammar for many languages involves modulo rules and a lot of special cases.

Therefore, we are now leveraging the ICU4J libraries that are built into newer versions of the Android SDK, which provide correct localizations of relative-time strings, so we no longer have to worry about constructing them ourselves using plurals or other custom logic.

device-2018-05-15-170519.png (2×1 px, 505 KB)

All of the other other pluralizable strings in our code (that don't involve dates) call for only one and other cases, and can continue to be handled by plain plurals resources.

Dbrant claimed this task.