Page MenuHomePhabricator

Homepage: Time localisation
Open, Needs TriagePublic

Description

The newcomer homepage is a meant to be a place for newcomers to quickly get oriented and begin their work on their wiki.


I'm currently beta-testing the Homepage in Czech to double check it is working for Czech environment.

I have concerns about Growthexperiments-homepage-account-age message. At least for "1 minute", it translates to Czech as "Svůj účet již máte vytvořen 1 minuta". However, that's incorrect in this context. Correct localisation would be "Svůj účet již máte vytvořen 1 minutu". For 2+ mins, it works correctly.

I don't think how this work for other "1 x", but I think it works in similar way.

Any thoughts how to solve this? @kostajh, @SBisson?

@MMiller_WMF, how much important this bug is?

Related Objects

StatusAssignedTask
OpenNone
ResolvedCntlsn
OpenNone

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2019, 10:11 AM
Urbanecm updated the task description. (Show Details)Apr 12 2019, 10:11 AM
SBisson added a comment.EditedApr 12 2019, 10:47 AM

The difference between the two messages is the last word: "minuta" (incorrect) vs. "minutu" (correct). Are they plural and singular respectively or is it a different rule?

This is the message is uses: "duration-minutes": "$1 {{PLURAL:$1|minuta|minuty|minut}}"

No, it's not relevant to singular/plural. Different grammatical case should be used. "minuta" is singular nominative, "minutu" is singular accusative. Depending on messages that make use of duration-minutes message, the easiest solution might be to just change the singular in duration-minutes to not be singular nominative, but singular accusative. Where is duration-minutes used?

[...]
Where is duration-minutes used?

I think it's only used in Language::formatDuration so it should be safe the change the translation to something more appropriate to this context.

The problem is that for sentence "Length of this video is 1 minute", it can be translated to Czech in two ways. One alternative is "Video trvá 1 minutu" (where accusative would be used), but second alternative is "Délka tohoto videa je 1 minuta", where nominative (the current form) would be used. One real example is ipb-blocklist-duration-left, which would be then incorrect after this change you propose.

Since duration-minutes doesn't seem to be Growth-Experiments specific (see below, let me know if I grep incorrectly :D), it would then change ipb-blocklist-duration-left to an incorrect translation.

urbanecm@notebook ~/unsynced/gerrit/mediawiki/extensions/GrowthExperiments/i18n
$ grep duration-minutes */*.json
urbanecm@notebook ~/unsynced/gerrit/mediawiki/extensions/GrowthExperiments/i18n
$

You're right, the output of Language::formatDuration() is used in several places even within the Message class when durationParams() is used. This is not Growth-specific and we would like to avoid copying the code and messages over if possible.

Usages of formatDuration
Usages of durationParams

I guess one could audit all usages and choose the form that is less wrong overall.

Another option is to investigate grammar transformations support for Czech. I know it exists for a few languages but I know nothing about it. I would defer to @Amire80 and the Language-Team for advice.

The initial task description applies also to Russian and probably most other Slavic languages.

I'm on my phone and I haven't looked at the code, but {{PLURAL}} is the way to go. Translators must be allowed to use PLURAL in each message. If there's a message that has just the word "minutes", it must either be used with all the plural forms, or maybe it should not be used at all.

There's also Moment.js, which is a library for time expressions in many languages, but it's probably good only for standalone expressions and not for sentences that include numbers and units of time.

@Amire80 Plural is allowed, and if you scroll up in this task, you'll see that the cs definition indeed makes use of it. The problem is we have TWO singular forms, that depend on particular sentence, and not on the number (understandably, the number for singular is 1). For instance in Czech, you say "Délka videa je 1 minuta" (singular nominative), but "Video trvá 1 minutu" (singular accusative). A "solution" would be to rephrase the translation in a way that would require nominative to be used, but I'm afraid it would look strangely in this context. In case there isn't any simple solution, declining would probably do more good than rephrasing.

I have absolutely 0 idea about if this is something MW can solve. If not, we can probably ignore that for now and convert this task to a longer-term one, where we will look for one (if such a task doesn't exist yet, ofc).

Anyway, I'll think about potentional rephrasings, maybe something feasible exists...

[...]
Translators must be allowed to use PLURAL in each message. If there's a message that has just the word "minutes", it must either be used with all the plural forms, or maybe it should not be used at all.

This is exactly the issue. This message is part of a "lego" system that builds a duration like "2 hours, 1 minute", which is then used as a parameter in several other messages. I think this is allowed to exist in MediaWiki because it works relatively well in enough languages but it's clearly not a good practice. I was hoping another tool would be able to come in and make adjustments based on grammar rules.

There's also Moment.js, which is a library for time expressions in many languages, but it's probably good only for standalone expressions and not for sentences that include numbers and units of time.

We're in server-side MediaWiki (PHP) ;)

OK, now I've looked at the code properly and I see formatDuration. It's kind of like a simplified Moment.js, for PHP. Like Moment.js, it's good for standalone expressions, but not necessarily for embedding in sentences.

How much is it used in this extension? Could you redo the messages without using formatDuration, but to pass the values of minutes / hours / days? Or to go even further and give up on minute-level precision? Would the designers agree to just something like "You've had your account for a few minutes / hours / days"?

@Urbanecm @SBisson -- I'm bringing this back up since we haven't heard anything on it in a while. Because I'm not used to the Czech language, I don't have a sense of how important it is to solve this. @Urbanecm, can you make the decision on that? Is this something that (a) must be addressed before deploying (b) should probably be addressed at some point (c) doesn't really ever need to be addressed?

@MMiller_WMF I wouldn't call it a deployment blocker. The gramatical mistake doesn't prevent Czech speakers to understand the sentence. Even that, I don't think we should ignore this for good :). Maybe a good topic for discussion during Wikimedia-Hackathon-2019?

Thanks, @Urbanecm. I think it is a good idea to tag it for Hackathon, so please do that. We'll leave this open and attached to the homepage epic so that we don't forget about it.

Aklapper moved this task from Backlog to Projects on the Wikimedia-Hackathon-2019 board.

(not part of May 2 launch)

MMiller_WMF updated the task description. (Show Details)May 16 2019, 4:24 PM
Yupik added a subscriber: Yupik.May 17 2019, 7:07 PM

The initial task description applies also to Russian and probably most other Slavic languages.

And quite a few other languages too, including most if not all Finno-Ugric languages. And it's not just a matter of sg. - pl. in the same case the whole time.

Russian, for example, has for numbers in the nominative/accusative (inanimate) case:

  • 1 - nominative sg.
  • 2-4 - genitive sg.
  • 5-20 - genitive pl.

anything about 20, it depends on the last number and then they go according to the above selection.

In Northern Saami, we have:

  • 1 - nominative sg.
  • 2+ - genitive-accusative sg.

In Finnish:

  • 1 - nominative sg.
  • 2+ - partitive sg.

In Inari Saami:

  • 1 - nominative sg.
  • 2-6 - genitive sg.
  • 7+ - partitive (no sg., no. pl, only partitive)

In theory, Skolt Saami follows the same system as Inari Saami, but the partitive is often replaced with the genitive sg. form and both are considered correct (so becoming more like the Northern Saami system).

If the number also needs to be declined (for example, when it's the object of a verb), it becomes a lot more fun in these languages with declining the numerals and the words they're modifying and that's a whole new ballgame.

Tgr added a subscriber: Tgr.Jul 15 2019, 12:29 PM

Grammatical cases aren't really supported by the i18n framework, translators typically have to work around them by finding a phrasing that's in the nominative. There's a {{GRAMMAR}} magic word which in theory can correctly inflect some limited, predefined set of words, but usually there's no way for the programmer to tell what case a message fragment is going to be in, that's going to depend on the translation of the surrounding message, and there is no way for the translator to set parameters.

Moving on Growth-Team board to reflect that we don't have short-to-medium term plans to work on this; it sounds like this issue might be best edited and re-tagged as a broader task for MediaWiki core.