Page MenuHomePhabricator

convenient discussions messages have many empty gender and plural magic words
Closed, ResolvedPublicBUG REPORT

Description

The English Convenient Discussions message file has many messages with gender and plural usages such as:

{{plural:$2|1=|}}
 {{plural:$2|1={{gender:$4|}}|{{gender:$4|}}}}
thanks to {{gender:$2|}} $1 for [$3 this edit]

... and so on.

This is quite unusual. The usual practice in MediaWiki is that when English doesn't need different gender or plural forms, then one English string is written inside the double curlies, and it's not left empty. This can possibly work here, too, and I thought of just making a patch that changes all of them, but this is so pervasive that I thought that perhaps there's something special about this in this project, so I'm reporting it separately.

Event Timeline

Thanks for filing this.

So, this has its roots in the fact that whereas it's correct in English to say "3 comments by User1, User2, User3", in inflected languages like Russian the calque would sound like "3 сообщения от Участник1, Участник2, Участник3" which is incorrect. We can't inflect usernames, so we need to add a generalizing noun that will carry the weight of the genitive case + gender. Compare:

en.json
	"thread-expand-label": "Expand the thread ($1 {{plural:$1|comment|comments}} by {{plural:$2|1={{gender:$4|}}|{{gender:$4|}}}} $3)",
ru.json
	"thread-expand-label": "Развернуть ветку ($1 {{plural:$1|сообщение|сообщения|сообщений}} от {{plural:$2|1={{gender:$4|участника|участницы}}|{{gender:$4|участников|участниц|участников(-ц)}}}} $3)",

I've added empty {{gender:}}s and {{plural:}}s hoping that this will make it easier for translators to inflected languages to come up with the most correct markup. For English, there is simply nothing to wrap them around.

However, since you stated this is unusual, I've just searched this message, thread-expand-label, across i18n files and discovered that no languages besides Russian actually utilize this — either this markup is too intricate for translators to grasp, or the languages don't need such a convoluted logic. When it comes to the simple empty {{gender:$N|}}, the content is added more often: 6 with content vs. 30 without (for es-reply-to); yet it's still rarely used.

In this situation, I presume, it doesn't make much sense to go along with my rationale and we should just remove empty {{gender:}}s and {{plural:}}s to eliminate confusion. What do you think?

In this situation, I presume, it doesn't make much sense to go along with my rationale and we should just remove empty {{gender:}}s and {{plural:}}s to eliminate confusion. What do you think?

You should definitely not make that, because that would mean $2 and $4 no longer appear in the English message, which in turn would make Translate mark them as superfluous in the Russian translation, showing warnings to translators and maybe even marking the Russian translation as outdated so long as these two parameters are used in it (at least that’s what happens in MediaWiki translations, probably Convenient Discussions is configured similarly).

Your issue is not unique, only your solution is:

  • logentry-tag-update-revision is a similar case to the PLURAL issue ($7 and $9 are lists with one or more items, but English doesn’t care whether there are one or more items). It includes an arbitrary word in a single-parameter PLURAL parser function.
  • contributions-subtitle is a similar case to the GENDER issue (there’s no word that refers to the user other than the user name). It simply includes the user name in the GENDER parser function.

I’d follow these in Convenient Discussions, e.g.

en.json
	"thread-expand-label": "Expand the thread ($1 {{plural:$1|comment|comments}} by {{plural:$2|{{gender:$4|$3}}}})",

However, since you stated this is unusual, I've just searched this message, thread-expand-label, across i18n files and discovered that no languages besides Russian actually utilize this — either this markup is too intricate for translators to grasp, or the languages don't need such a convoluted logic. When it comes to the simple empty {{gender:$N|}}, the content is added more often: 6 with content vs. 30 without (for es-reply-to); yet it's still rarely used.

I think it’s (in part) due to it being too intricate. If Russian needs it, I guess other Slavic languages (e.g. Ukrainian) are also likely to need it. (By the way, translatewiki.net can also list all translations, and unlike GitHub, it doesn’t even require login for that.)

es-reply-to looks plausible; probably most languages indeed don’t need this piece of information, but if some do, you should make it available to (all) translators.

@Tacsipacsi Thanks for your guidance.

I’d follow these in Convenient Discussions, e.g.

en.json
	[...] {{plural:$2|{{gender:$4|$3}}}} [...]

But here comes another problem. People tend to overlook the fact that in cases like this you need not {{plural:$N|}} but {{plural:$N|1=}}, because the inflection form is not the same for 1 and 21.

(By the way, translatewiki.net can also list all translations, and unlike GitHub, it doesn’t even require login for that.)

Oh, thanks, I didn't know. Now I noticed a link to this page in the sidebar.

image.png (416×197 px, 12 KB)

If Russian needs it, I guess other Slavic languages (e.g. Ukrainian) are also likely to need it.

Sure. I've translated into Ukrainian.

People tend to overlook the fact that in cases like this you need not {{plural:$N|}} but {{plural:$N|1=}}, because the inflection form is not the same for 1 and 21.

This is why my version ended up so verbose — it's likely that people will fill in the options for {{plural:$2|}} but do it incorrectly.

So, which markup do you think is best:

  • {{plural:$2|{{gender:$4|$3}}}} (the one you suggested)
  • {{plural:$2|1={{gender:$4|$3}}|{{gender:$4|$3}}}}
  • {{plural:$2|1={{gender:$4|}}|{{gender:$4|}}}} $3 (the original one)

?

But here comes another problem. People tend to overlook the fact that in cases like this you need not {{plural:$N|}} but {{plural:$N|1=}}, because the inflection form is not the same for 1 and 21.

I don’t think overly complex English structure helps with that. Instead, if you fear people won’t understand how {{PLURAL}} works in their language (in a lot of languages, it works such that anything more than one is categorized “many”, but Slavic languages indeed tend to be more complex), I think you should explain the situation in free text in the message documentation. So…

So, which markup do you think is best:

  • {{plural:$2|{{gender:$4|$3}}}} (the one you suggested)
  • {{plural:$2|1={{gender:$4|$3}}|{{gender:$4|$3}}}}
  • {{plural:$2|1={{gender:$4|}}|{{gender:$4|}}}} $3 (the original one)

?

still the first one.

Now I noticed a link to this page in the sidebar.

Well, the link is new to me as well, thanks for pointing it out! Even though it’s been around for almost 18 years now (rETRA7ed1ae47dd8eae50c0c9e5f19fe2fc87cbc96e4d)…

Sure. I've translated into Ukrainian.

Thanks!

if you fear people won’t understand how {{PLURAL}} works in their language (in a lot of languages, it works such that anything more than one is categorized “many”, but Slavic languages indeed tend to be more complex)

The thing is, even if you do understand that, what's confusing is that you have {{plural:$N|}} and {{plural:$N|1=}} in one message due to different phrase structure: compare "21 comments" and "from users (list of 21 users)". It's just that the complexity we have here is irreducable.

What's most convincing to me here though, is that

  • it is rare for a discussion thread to have 21, 31, etc. participants (and all the messages I have with |1= are about thread participants),
  • and the list of languages with that quirk is not so large.

So, even if people make that mistake (and they sure do), it's rare to pop up. Whereas complex markup is more harmful. Therefore, I'll use the version you suggested.

Thanks.

Note: people unfortunately do get confused by this shorter syntax {{plural:$2|{{gender:$4|$3}}}} as well—in this case, the translator moved $3 inside the curly braces, but only for the last parameter :(

The thing is, even if you do understand that, what's confusing is that you have {{plural:$N|}} and {{plural:$N|1=}} in one message due to different phrase structure: compare "21 comments" and "from users (list of 21 users)". It's just that the complexity we have here is irreducable.

I see, but yeah, there’s probably no way to avoid it.

What's most convincing to me here though, is that

  • it is rare for a discussion thread to have 21, 31, etc. participants (and all the messages I have with |1= are about thread participants),

This one actually doesn’t convince me at all. It means that mistakes are less likely to be discovered, and when they do get discovered, chances are that they get discovered by someone who doesn’t have a translatewiki.net account, so they can do nothing but be annoyed…

  • and the list of languages with that quirk is not so large.

That list is well over a decade old, and doesn’t even focus on what you need (e.g. Icelandic and Macedonian both have more complex rules than =1, ≠1, but these more complex rules still only mean two categories, so they aren’t listed there). However, the number of affected languages is still not too high (about the double of that list, 40-45 rather than 22).