Page MenuHomePhabricator

Forcing the language for {{PLURAL}} rule
Open, LowPublic

Description

Author: p.selitskas

Description:
Just learnt about MediaWiki using {{PLURA}} with the base lanuguage only. That's enough for general wikis, but a little problem for multilanguage projects like Wikimedia Commons.

So is it the way to allow users to specifiy which language to use with PLURAL statement? For example, {{PLURAL:be:{{NUMBER}}|one|for|eight}}. Is it real to be done?


Version: unspecified
Severity: enhancement

Details

Reference
bz22985

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:04 PM
bzimport set Reference to bz22985.
bzimport added a subscriber: Unknown Object (MLST).

pdhanda wrote:

Chatted with Nikerabbit on MediaWiki-Internationalization about this. He thinks the best way to do this would be to make the parser aware of language context and not add extra hacks to the individual parser functions.

I think that makes sense. I can look into it soon.

p.selitskas wrote:

(In reply to comment #1)

Chatted with Nikerabbit on MediaWiki-Internationalization about this. He thinks the best way
to do this would be to make the parser aware of language context and not add
extra hacks to the individual parser functions.

I think that makes sense. I can look into it soon.

Yeah, guys at MediaWiki-General said me that the parser doesn't know which language does the engine operate.

Thanks in advance. :)

p.selitskas wrote:

(In reply to comment #1)

Chatted with Nikerabbit on MediaWiki-Internationalization about this. He thinks the best way
to do this would be to make the parser aware of language context and not add
extra hacks to the individual parser functions.

I think that makes sense. I can look into it soon.

Just discussed this issue at my local wiki.

The way you're going to develop this idea is not the same as my proposal. Even if text was written in Klingon and we switched the interface to Greek, the text would still be in Klingon. So, the parser _has_ to know the language it's going to operate if it wanna be fully i18n-friendly as I understand.

Sincerely, ...

Priyanka, can you please give some insight in your current planning for this issue given your comment "I can look into it soon." almost 7 weeks ago.

p.selitskas wrote:

I've found an extension I18nTags [1] which is supposed to implement what I proposed. Though I don't actually realize how it works.

It shouldn't be too hard to develop a magic word for setting the language for a full page, or tag for marking only sections of page.

  1. Generally it would be advisable to be able set an arbitrary or variable language context. (For example in "Spanish for beginners" in the Spanish Wikiversity, there may be important hints in user preference languages that users already understand, even though the lesson itself builds on Spanish and images)
  1. Generally, knowing the current language context may be crucial. (For instance, "$1 user(s)" needs to use different PLURAL rules depending on its language, be it the wiki language, a user language, or any of either fallback languages, or an arbitarily choose one)
  1. For some complex or patchwork messages, e.g. "user X copied/moved Y file(s)/dictory(ies)" having GENDER, PLURAl and variable insertions of other messages, when parts are not all localized, it may be even worth knowing the available languages of the pieces so as to find one language for the entire message and not mix bits and pieces from various languages to a non-sentencce or gibberish. (Remember that snippets of different languages may not fit together and may not be suited to replace each other across languages. A simple example: In English, you may have "Uploaded the $1" where $1 is "image" or "video", while French or German (retranslated) has "Uploaded $1" where $1 is "the image" or "the video" since German has two different articles here. Mixed translations will display no article at all, or duplicate articles, respectively)

Let's not mix translations in this bag. Setting a language for the whole page would be a start.

p.selitskas wrote:

(In reply to comment #8)

Let's not mix translations in this bag. Setting a language for the whole page
would be a start.

Yes, it's quite easier to implement a whole-page switch instead of my proposed per-call switch. I think it would be enough for Commons.

(In reply to comment #6)

It shouldn't be too hard to develop a magic word for setting the language for a
full page, or tag for marking only sections of page.

Agreed.
When assesing this, please consider scripts and directionalities coming with them. A script having its directionality different from the wikis directionality requires an HTML block element with the correct dir="rtl/ltr" attribute being the container of the language text so as to be rendered correctly.

p.selitskas wrote:

As WM1.19 rolls out page content language approach, it's a pity that PLURAL and/or GRAMMAR are not yet supported (at least the check fails in labs.wikimedia.beta).

What check? They definitely should be supported - there is just no way to set the page content language manually yet.

p.selitskas wrote:

(In reply to comment #12)

What check? They definitely should be supported - there is just no way to set
the page content language manually yet.

I've missed that this trick words for MediaWiki namespace only. That's a shame.

P.S. in MediaWiki everything works just fine, at least in MediaWiki Talk namespace :)

p.selitskas wrote:

Some obvious things for your consideration:

  1. Custom magic word (e.g. {{CONTENTLANG:xx}}) with $parser->mOptions->setTargetLanguage( $language ) makes the desired action, but it doesn't change the actual content language (HTML attributes and other stuff stay the same).
  1. Setting $wgContLang directly in a parser hook call makes the desired action, and perhaps makes it too hard. For example, in be-tarask wiki with {{CONTENTLANG:ru}}, [[Катэгорыя:...]] (category) would not work, but the Russian [[Категория:...]] will do. I believe this is not desired at all; on the other hand, at some point this "artefact" may be, let's say, user-friendly in terms of language skills of an editor, but this makes the wiki in its entirety non-maintainable (non-consistend magic word calls, categories, etc).

I've thought of adding a field page.page_lang which would be passed to $wgContLang, but that would mean the same as described in #2 plus additional DB overhead and an interface to control the value (special page?).

It is possible that I didn't work hard enough to hook PageContentLanguage, but normally hooking it in a parser hook call doesn't make much sense due to the nature of the parser, rendering and caching.

(In reply to comment #0)

Just learnt about MediaWiki using {{PLURA}} with the base lanuguage only.
That's enough for general wikis, but a little problem for multilanguage
projects like Wikimedia Commons.

Wizardist, could you please clarify what's the use case for the original request, disregarding all the later additions? I can't find it anywhere on this report.

p.selitskas wrote:

(In reply to comment #15)

(In reply to comment #0)

Just learnt about MediaWiki using {{PLURA}} with the base lanuguage only.
That's enough for general wikis, but a little problem for multilanguage
projects like Wikimedia Commons.

Wizardist, could you please clarify what's the use case for the original
request, disregarding all the later additions? I can't find it anywhere on
this
report.

This use case was obvious for Wikimedia Commons before Translate extension was widely deployed:

We have a lot of pages in lots of languages in one wiki, and every one of those pages would like to rely on core-driven language functions like plurals or grammar converter, as well as number and dates formatting.

Now, with Translate extension there is no need to bother oneself with this, as every translated page is delivered with a proper content language via a hook in ContentHandler, based on the page title (title/langcode, like as in MediaWiki namespace).

However, theoretically, Translate may not cover every use case, and there could be a need to return to a generic page written in a language different from the default one. I cannot pick an example, but who knows, there may be some of them.

FYI, an allegedly working PoC (language is switched by a direct change in DB there :P ): Id63573a7f

(In reply to comment #16)

We have a lot of pages in lots of languages in one wiki, and every one of
those
pages would like to rely on core-driven language functions like plurals or
grammar converter, as well as number and dates formatting.

Sorry, this is still not obvious to me. If PLURAL is being used in Commons templates and so on, why would you want to force it to a language other than the interface language? Is there an example? (Also on another wiki if needed.) Thanks.

p.selitskas wrote:

(In reply to comment #17)

(In reply to comment #16)

We have a lot of pages in lots of languages in one wiki, and every one of
those
pages would like to rely on core-driven language functions like plurals or
grammar converter, as well as number and dates formatting.

Sorry, this is still not obvious to me. If PLURAL is being used in Commons
templates and so on, why would you want to force it to a language other than
the interface language? Is there an example? (Also on another wiki if
needed.)
Thanks.

Hmmmm.... Okay, let me spell this out.

Forget the Translate extension and Special:MyLanguage (correct me if I misspelled this page). Let's go in like 2010+/-.

We have a policy page in English (and the wiki default language is English - all language stuff is supplied by LanguageEn (not correct technically, but I hope you get it)). We have a translation of that policy in Russian (the default wiki language is still English - all language stuff in #mw-content is delivered by LanguageEn).

Did you see that? On that page we would have two plural forms instead of three (or four, like CLDR defines/would like to define), no grammar rules (in English, wuut?), English-style formatting (123,567.123 instead of Russian-style 123 567,123), English-written dates etc. etc.

So I don't get your misunderstanding of how the interface language influences the content language of the page. It just doesn't. When you surf the Commons with English interface, you will unlikely get English contents on the Russian page, right? :) So why on the Russian page should the content language rely on your interface settings?


You may be mixing up translated pages with such things like Autotranslate, which exploits a hack of {{int:lang}}. If it's not fixed, then you will continue to be delivered Autotranslate'd templates in your interface message. That's what I can see.


Sorry if I don't get the point. Please clarify then,
thanks.

(In reply to comment #18)

So I don't get your misunderstanding of how the interface language influences
the content language of the page. It just doesn't. When you surf the Commons
with English interface, you will unlikely get English contents on the Russian
page, right? :) So why on the Russian page should the content language rely
on
your interface settings?

The Russian reader is supposed to have Russian interface. If you speak Russian and have Russian interface, and you read a policy page solely written in English, and the page contains a formatted number or a formatted date, then you'll see numbers and dates formatted as Russian in the middle of English text.
This doesn't look like a big problem for me; plus, I still don't see what page could be using PLURAL, that seems even more unlikely.


You may be mixing up translated pages with such things like Autotranslate,
which exploits a hack of {{int:lang}}. If it's not fixed, then you will
continue to be delivered Autotranslate'd templates in your interface message.
That's what I can see.

Indeed, you answer by yourself: most of the mixed-language content is created by LangSwitch, LanguageSelect, Autotranslate; the inconsistencies created in single pages by PLURAL and so on are negligible, or rather consistent with the expected inconsistency, :)
If I have a file page with descriptions in n languages, and a date or number, I prefer to have the date and number formatted according to my language so that the 1 description in my interface language is completely correct (and the n-1 other descriptions be inconsistent), rather than having my 1 language wrong, 1 language I don't care about correct and n-2 languages still inconsistent...

p.selitskas wrote:

Please don't concentrate on private cases. PLURAL was an example, and I'm talking about the whole language tools subset. Step aside from Commons, file pages (why in the hell would we change the language of those???).

Let's wait for some other opinions, because both we don't hear each other. (I state hereby that I'm not sure that I'm right, but I'm ready to defend my view on the problem in next iterations.)

(In reply to comment #20)

PLURAL was an example, and I'm
talking about the whole language tools subset.

The bug summary is about PLURAL...

Step aside from Commons, file
pages (why in the hell would we change the language of those???).

Let's wait for some other opinions, because both we don't hear each other. (I
state hereby that I'm not sure that I'm right, but I'm ready to defend my
view
on the problem in next iterations.)

I do hear you. I can vaguely imagine use cases myself and comment 18 helps with that, but the aim of this request is totally unclear. Then, again, it might be just me not seeing the obvious, but clarifying would help.

The Russian reader is supposed to have Russian interface.

No.

Multilingual readers in multilingual wikis read pages in various languages. You would not tell then to redo their iterface sttings for every other page they view, do you?

If you speak Russian and have Russian interface, and you read a policy page solely written in English, and the page contains a formatted number or a formatted date, then you'll see numbers and dates formatted as Russian in the middle of English text.

Which is completely unacceptable.

E.g. reading a page in a language which you are learning, you must be able to rely on it being correct.

Multilingual readers in multilingual wikis read pages in various languages. You would not tell then to redo their iterface sttings for every other page they view, do you?

No, I agree: each page should work by itself.

If you speak Russian and have Russian interface, and you read a policy page solely written in English, and the page contains a formatted number or a formatted date, then you'll see numbers and dates formatted as Russian in the middle of English text.

Which is completely unacceptable.

E.g. reading a page in a language which you are learning, you must be able to rely on it being correct.

Agreed. That's why PLURAL must follow content language (of the page), ignore interface language, and not have any language parameter.

If you design an autotranslated template and you cant to reuse it on the same page to show different rendering for different languages (e.g. in language learning pages showing differences between languages, you'll still be in trouble: the page itself is clearly intended to be multilingual with some subsections purposely NOT in the page main content language and explicit changes of language in the page itself (and not necessarily the user's prefered UI language).

This is a frequent case for documenting the autotranslated templates with examples shown in a set of distinct specific languages. For now your solution requires creating separate documentation pages, one for each target language, and we cannot build any standalone page just to show a pair of language (e.g. the UI language and a specific target language, or the wiki default language or the page content language and a specific target language).

The result will always be incoherent on pages intended to be really multilingual in their own content (where the specific target language to show is a very small extract embedded into the main page content).

For now the only supported case like this is to allow embedding small extracts in the UI language within a page targetting a single language, or the reverse: embedding specic target language in a page whose content language is set by the Translate extension (Pagename/langcode) according to the UI language. No other kinds of pairs is allowed. We have no way to track correctly which language is intended in other kind of multilingual pages using other pairs of specific languages, or more languages than a pair. E.g. in Wikisource for linguistic books (where we would like to tag the generated content correctly for correct rendering or layout of text), or in Wiktionnary containing definitions of the same orthographic word in many languages along with other contents generated by generic templates using a parameter for a specific target language to use and rendered in specific sections of the same page for a specific language).

Note that tagging (in HTML or CSS) for a language is necessary for correct rendering and correct selection of fonts in web renderers or correct bidi layout (not just "bdi" but also to define the correct left/right side of margins or paddings or to add some RTL vs. RTL CSS class names such as "mw-content-rtl" or to select an appropriate font size or style, or to find suitable substitutes to case transforms for unicameral scripts, or to character spacing in ligated scripts when rendering justified text, or to know where we can insert line breaks opportunities in Asian scripts that don't use whitespace separation between words but do not allow linebreaks everywhere like in Chinese), it is not just for selecting the plain text translation to display (or the filename of an image/icon, which may also need to be mirrored or transformed in non affine ways or would use other colors/symbols, possibly with styling too instead of just different filenames: see also the case of multilingual SVG files where you need to specify the target language to render, or the case of MPEG and OGV videos with selectable multingual audio tracks or subtitles) !