Page MenuHomePhabricator

TemplateData: Process language fallback and conversion server-side
Closed, ResolvedPublic

Description

After I added the <templatedata> to [[pt:Template:Referências]], I opened a page in which it is used, setting "uselang=pt" and "uselang=pt-br" in the URL:

  1. https://pt.wikipedia.org/wiki/Arte?veaction=edit&uselang=pt
  2. https://pt.wikipedia.org/wiki/Arte?veaction=edit&uselang=pt-br

In the first case, when I opened the Transclusion dialog (by clicking in the references section, where the template is used) the template description was shown (as expected). Typing one of the template parameters, I also get its label normally.

On the other hand, for the second link the user language (pt-br) was different from the content language (pt) and the user see no description and no labels.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=50888

Details

Reference
bz50431

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:51 AM
bzimport added projects: TemplateData, I18n.
bzimport set Reference to bz50431.
He7d3r created this task.Jun 29 2013, 7:41 PM

For the record, the data returned by the API has all the information, but it is in the "pt" properties:
https://pt.wikipedia.org/w/api.php?format=jsonfm&action=templatedata&titles=Predefini%C3%A7%C3%A3o%3Arefer%C3%AAncias
E.g.:
{

"pages": {
    "1467239": {
        "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
        "description": {
            "pt": "Produz o t\u00edtulo da se\u00e7\u00e3o de refer\u00eancias e impede que seja edit\u00e1vel"
        },
        "params": {
            "t\u00edtulo": {
                "label": {
                    "pt": "T\u00edtulo da se\u00e7\u00e3o"
                },
                ...
        }
        ...
 }

}

API call should ideally specify the fall-back language chain and return just one set for clients (so the weight is on the server).

Falling back to (LanguageConverter)-converted labels is needed too.

It seems my LanguageFallbackChain and related classes (currently in Wikibase) is useful here again.

(In reply to comment #2)

API call should ideally specify the fall-back language chain and return just
one set for clients (so the weight is on the server).

Do we care about the real language info of a label.

That is, is this fine?

{

"pages": {
    "1467239": {
        "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
        "description": {
            "pt": "Some text written in English"
        },
        "params": {
            "t\u00edtulo": {
                "label": {
                    "pt": "Some other text written in Spanish"
                },
                ...
        }
        ...
 }

}

(In reply to comment #4)

(In reply to comment #2)

API call should ideally specify the fall-back language chain and return just
one set for clients (so the weight is on the server).

Do we care about the real language info of a label.

We don't, but users on multi-lingual wikis will if we send them 1 MiB of descriptions of a template when they only care about 2 KiB worth of the contents. :-)

That is, is this fine?
{

"pages": {
    "1467239": {
        "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
        "description": {
            "pt": "Some text written in English"
        },
        "params": {
            "t\u00edtulo": {
                "label": {
                    "pt": "Some other text written in Spanish"
                },
                ...
        }
        ...
 }

}

Fine technically - I think users would be upset and confused.

So I'm not too Wikidata-savvy, but it seems this patch might be useful (if it's not, ignore me and carry on):

https://gerrit.wikimedia.org/r/72867

Right now there is no way to determine the real language of a message from the CDB cache. That patch changes this.

(In reply to comment #5)

Fine technically - I think users would be upset and confused.

So you'll need to prepare for some interface design to tell users that "Some text written in English" and "Some other text written in Spanish" lines are not in pt.

And the format of JSON needs to be modified to include this info. Like:

{

"pages": {
    "1467239": {
        "title": "Predefini\u00e7\u00e3o:Refer\u00eancias",
        "description": {
            "pt": { "value": "Some text written in English", "language": "en" }
        },
        "params": {
            "t\u00edtulo": {
                "label": {
                    "pt": { "value": "Some other text written in Spanish", "language": "es" }
                },
                ...
        }
        ...
 }

}

(In reply to comment #6)

So I'm not too Wikidata-savvy, but it seems this patch might be useful (if
it's
not, ignore me and carry on):
https://gerrit.wikimedia.org/r/72867
Right now there is no way to determine the real language of a message from
the
CDB cache. That patch changes this.

Not really. Labels and descriptions in TemplateData are stored in some JSON blob in a customized format, rather than normal messages.

(In reply to comment #7)

(In reply to comment #5)

Fine technically - I think users would be upset and confused.

So you'll need to prepare for some interface design to tell users that "Some
text written in English" and "Some other text written in Spanish" lines are
not in pt.

Why couldn't the TemplateData just be written in the users's language?

And the format of JSON needs to be modified to include this info.

I don't think that's a good outcome. If there isn't a description in your language (in this case, pt), we shouldn't magically tell you that we've given you a message in a different language (we don't do this for the MW messages framework, for instance).

(In reply to comment #9)

(In reply to comment #7)

(In reply to comment #5)

Fine technically - I think users would be upset and confused.

So you'll need to prepare for some interface design to tell users that "Some
text written in English" and "Some other text written in Spanish" lines are
not in pt.

Why couldn't the TemplateData just be written in the users's language?

You can't expect all templatedata blocks to have labels in all hundreds of languages which MediaWiki supports.

And the format of JSON needs to be modified to include this info.

I don't think that's a good outcome. If there isn't a description in your
language (in this case, pt), we shouldn't magically tell you that we've given
you a message in a different language (we don't do this for the MW messages
framework, for instance).

We do so in Wikibase, if the user indicates that they can read another language -- in our implementation it checks {{#babel: }} currently but some global preferences here will be nice of course.

(In reply to comment #9)

I don't think that's a good outcome. If there isn't a description in your
language (in this case, pt), we shouldn't magically tell you that we've given
you a message in a different language (we don't do this for the MW messages
framework, for instance).

BTW this comment means WONTFIXing this whole bug.

(In reply to comment #11)

(In reply to comment #9)

I don't think that's a good outcome. If there isn't a description in your
language (in this case, pt), we shouldn't magically tell you that we've given
you a message in a different language (we don't do this for the MW messages
framework, for instance).

BTW this comment means WONTFIXing this whole bug.

Why?

This is just me saying that I don't think that instead of "Chien", if it doesn't exist in French we should give users "Dog -- OMG We gave you this message in English even though you asked for it in French!", which feels significant over-kill.

(In reply to comment #12)

(In reply to comment #11)

(In reply to comment #9)

I don't think that's a good outcome. If there isn't a description in your
language (in this case, pt), we shouldn't magically tell you that we've given
you a message in a different language (we don't do this for the MW messages
framework, for instance).

BTW this comment means WONTFIXing this whole bug.

Why?
This is just me saying that I don't think that instead of "Chien", if it
doesn't exist in French we should give users "Dog -- OMG We gave you this
message in English even though you asked for it in French!", which feels
significant over-kill.

Then I guess your point is that pt-br and pt are more similar, so falling back from pt to pt-br is acceptable, while fr and en are not this case. However technically pt-br and pt have the same relationship as fr and en, or we'll have to compose some language similarity table ourselves, and manage to resolve many edge cases (eg. dialects).

(In reply to comment #13)

(In reply to comment #12)

(In reply to comment #11)

(In reply to comment #9)

I don't think that's a good outcome. If there isn't a description in your
language (in this case, pt), we shouldn't magically tell you that we've given
you a message in a different language (we don't do this for the MW messages
framework, for instance).

BTW this comment means WONTFIXing this whole bug.

Why?
This is just me saying that I don't think that instead of "Chien", if it
doesn't exist in French we should give users "Dog -- OMG We gave you this
message in English even though you asked for it in French!", which feels
significant over-kill.

Then I guess your point is that pt-br and pt are more similar, so falling
back from pt to pt-br is acceptable, while fr and en are not this case.

Yes.

However technically pt-br and pt have the same relationship as fr and en,
or we'll have to compose some language similarity table ourselves, and manage
to resolve many edge cases (eg. dialects).

Oh. I assumed the jQuery.i18n (or one of the other JS, MW-independent tools that the Language Engineering team have built) would have this built in. Is that not the case?

(In reply to comment #8)

(In reply to comment #6)

So I'm not too Wikidata-savvy, but it seems this patch might be useful (if
it's
not, ignore me and carry on):
https://gerrit.wikimedia.org/r/72867
Right now there is no way to determine the real language of a message from
the
CDB cache. That patch changes this.

Not really. Labels and descriptions in TemplateData are stored in some JSON
blob in a customized format, rather than normal messages.

Gotcha. Carry on. Sorry I couldn't help.

For now this is up to the client side to handle, which realistically means it won't be handled (current language > en > nothing).

For the future I intend to have the templatedata API take a parameter for language code and resolve it on the server side. For three reasons:

  • On wikis where there is more than 1 language commonly used (which is the whole point of this bug and where it is relevant, since if there is only 1 language, the wiki author can just specify { "description": "Text." } without lang-codes)..., on those wikis there will be more than 1 language defined. This will result in a large blob of JSON being transferred to e.g. VisualEditor for each template which is quite a lot of data.
  • Even so, it would then still require the client-side to have knowledge of all of this and process it. Which involves a lot of language data being send to the client, a lot of translations being sent to the client, and the then client having to do all the computation for it. We can solve this the same way we solved it in ResourceLoader; We'll still cache it, but fragment it by language code based on request context.

Also, this way we can provide good values for languages that don't exactly fallback but use a language converter. Which is also something that could potentially be done client side, but I don't see that happening just yet.

(In reply to comment #17)

Also, this way we can provide good values for languages that don't exactly
fallback but use a language converter. Which is also something that could
potentially be done client side, but I don't see that happening just yet.

Which isn't really doable currently I guess; or it requires delivery of huge conversion tables (for Chinese).

(In reply to comment #14)

Oh. I assumed the jQuery.i18n (or one of the other JS, MW-independent tools
that the Language Engineering team have built) would have this built in. Is
that not the case?

I can't say there isn't one but I've never heard of this.

BTW I also want a similar one on server side.

Change 87724 had a related patch set uploaded by Krinkle:
Implement getIntefaceTextInLanguage and use API and Parser

https://gerrit.wikimedia.org/r/87724

Change 87724 merged by jenkins-bot:
Implement getInterfaceTextInLanguage and use API and Parser

https://gerrit.wikimedia.org/r/87724