Page MenuHomePhabricator

Structured localization framework for Scribunto modules
Open, HighPublic

Description

For much more details about this project, see https://www.mediawiki.org/wiki/Translatable_modules

Scribunto modules that are useful in more than one language need a convenient and uniform localization framework for translating their messages.

This is needed for modules that are used across wikis, and even for modules that are used in multilingual wikis such as Commons or Wikidata.

At the moment there is no such framework. Like templates, modules can be translated by copying the module code to another wiki, going through its wiki syntax, and changing the strings. This is very flexible, but also extremely inefficient because it creates a forked copy and severs the connection to the original module and doesn't allow proper code reuse and collaboration.

There is also the TNT system, which is based on the Translate extension's page translation capability, but it is primarily made for templates, and has various disadvantages (see T238411).

Some modules try to include capability for translations using arrays indexed by language code, but there is no uniform framework for it nor a tool that allows their translation without diving into Lua code.

It would be much nicer if modules could be translated the same way extensions are:

  • Having the same underlying algorithmic and page layout code for all the languages, but separate translations.
  • Using a dedicated translation interface so that translators wouldn't have to deal with any code or wiki syntax (probably with the Translate extension).

My user-level proposal for how this will be done is described in more detail on the page Global_templates/Draft_spec/TLDR and in even more detail on the page Global_templates/Draft_spec.

Some steps to making this reality:

  • Some adaptations in the Translate extension: Showing translatable modules in the message group selector, and possibly other things.
  • Decision on which syntax to use for inserting messages into the module code. This is already possible with https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#Message_library , but could perhaps be enhanced.
  • Decision on where to store the messages, and how to customize them per project.
  • And probably more things.

Unlike templates, modules are probably easier and better equipped for becoming properly localizable. For example, it's desirable to allow the translation of of template titles and parameter names, because they are frequently used by end users who edit wiki content pages, but since modules are rarely transcluded directly into content pages, this feature is less essential for modules.

A truly comprehensive solution will only come around when it will be possible to efficiently share templates and modules across wikis (T41610, T52329, T121470), but some steps towards designing it and making necessary modifications in Translate can possibly be made earlier.

Some good steps towards designing and implementing such a thing were made in T122086: Use bot to share templates and modules between wikis, and a lot of the ideas from that project could be shared with this one.

This task is specifically for modules. Templates and gadgets have some similar requirements, but they are handled in other tasks. See also:

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
OpenNone
OpenNone
StalledNone
OpenNone
OpenNone
ResolvedAmire80
Resolvedppelberg
ResolvedNikerabbit
ResolvedNikerabbit
ResolvedNikerabbit
Resolvedabi_
Resolvedabi_
Resolvedabi_
ResolvedFeatureabi_
Resolvedabi_
Resolvedabi_
OpenBUG REPORTNone
OpenNone
OpenNone

Event Timeline

Anomie subscribed.

At the moment there is no such framework.

There is, it's just not at all convenient (it requires individual MediaWiki-namespace pages to be created for each message) so I'd guess few if any modules actually use it. You even linked it later in the post: https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#Message_library

What we probably want is something closer to the banana file format for storing the messages. I wonder whether translations could then be done via translatewiki?

Also, at the code level the structure of MessageValue and related classes recently added to MediaWiki core might be a better model than the MediaWiki Message class that mw.message was based on.

It would be good to list out the ad hoc actual implementations of internationalization/localization.

The message library only exposes a subset of methods, and most troublesome is lack of “plural”.

Note that using localized messages in content triggers several problems that isn't solved in current message library.

Simplest solution to the core problem would be to pull in JSON as Lua tables, so localized files can be read, which should be pretty simple. This is nearly identical to the stalled update of TemplateData.

Simplest solution to the core problem would be to pull in JSON as Lua tables, so localized files can be read, which should be pretty simple. This is nearly identical to the stalled update of TemplateData.

Do you have any info about this stalled update?

Partially related patch (5 years old):
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Scribunto/+/158323/
proposed to move Module:Arguments from enwiki to MediaWiki and adding translation capabilities to getArgs (translate)

Vutting renamed this task from Structured localization framework for Scribunto modules to vservices.Mar 21 2020, 6:16 AM
Vutting closed this task as Resolved.
DannyS712 renamed this task from vservices to Structured localization framework for Scribunto modules.Mar 21 2020, 6:20 AM
DannyS712 reopened this task as Open.

Btw, Wikia users have written a pure Lua module like this:
https://dev.fandom.com/wiki/Global_Lua_Modules/I18n

I only noticed this amazing thing now. What is truly cool is not even the i18n part, but the "global modules" part. How did Fandom do it? Did they fork Scribunto and add this? Is this something we can reuse to resolve T41610?

Just to throw in another monkey-wrench: this may be an opportunity to revisit the core language underpinnings of Scribunto, to switch to a language which is *fully* localizable.

https://en.wikipedia.org/wiki/File:Wikimania_2019_-_Multilingual_JavaScript.pdf
Video: https://www.youtube.com/watch?v=SomTEzaoROQ&t=1973

Also: it's not just the messages inside the template, it's also the name of the template itself and the name of every parameter to the template. If there are keyword arguments/enumerations, they need to be localized as well; ideally in a framework which does not require hand-coding by individual Scribunto module authors. (As a member of the Parsing Team, we prefer this to be done at the wikitext level, so that it is available for all templates as well as scribunto modules and other extensions.) This is discussed some in T239294.

Just to throw in another monkey-wrench: this may be an opportunity to revisit the core language underpinnings of Scribunto, to switch to a language which is *fully* localizable.

https://en.wikipedia.org/wiki/File:Wikimania_2019_-_Multilingual_JavaScript.pdf
Video: https://www.youtube.com/watch?v=SomTEzaoROQ&t=1973

That would be nice, and can be done separately, but we already have all this Lua code on the wikis, with probably hundreds of thousands of lines, which are providing useful, essential functionality, and which aren't going to magically rewrite themselves in JavaScript. But doing just the relatively small thing of improving how they handle localization sounds quite doable.

Also: it's not just the messages inside the template, it's also the name of the template itself and the name of every parameter to the template.

This task here is about modules, not templates. Templates are discussed in T239294, as you say. For templates, it's indeed important to localize the title and the parameter names, because they are often used by editors in literally all kinds of wiki pages, most of which are written in the language of the wiki. For modules, localizing the title and the parameters is on a spectrum of:

  • maybe nice to have, but not required - for the same reason as templates, but much less important because modules are rarely directly inserted into wiki pages. Modules are usually used in templates, and they are mostly maintained and used by experienced developers who don't mind using a piece of code with identifiers in a language that is different from the language of the wiki.
  • definitely unnecessary - given the above point, and the cost to develop it, no one will probably complain if names and parameters are not translated. I'm happy to hear different opinions, though.
  • harmful - because Lua is a programming language based on English and ASCII, like most programming languages, it may be actively harmful to encourage people to use non-ASCII and non-English identifiers. But again, I'm happy to hear different opinions.

If there are keyword arguments/enumerations, they need to be localized as well;

I'm not sure I understand this part. What exactly do you refer to when you say "keyword arguments/enumerations"? Is this something in modules? Or in templates? Or TemplateData? (This may be an embarrassing question, so I'll just admit that I'm really not a Lua expert.)

ideally in a framework which does not require hand-coding by individual Scribunto module authors.

Generally, the intention is indeed to move the localization into a consistent framework, which would be separate from raw Lua code in a way that is comparable to how it is done with extensions, with the necessary corrections for the Lua realities, for example that modules are stored as wiki pages and not as files in a file system or Git. If that's what you want, then we are on the same page. How exactly will it look is not decided yet, but hopefully will be decided soon, and your suggestions are very welcome.

(As a member of the Parsing Team, we prefer this to be done at the wikitext level, so that it is available for all templates as well as scribunto modules and other extensions.) This is discussed some in T239294.

What do you exactly mean by "at the wikitext level"?

About parameter translation: modules have access to the parent frame, which they often take advantage of. This means that modules directly access the parameters defined when transcluding the template, that is, the parameter name is the same in the article and the module, with no template sitting in between and translating, so these parameter names need to be translated in order to avoid English parameter names popping up in articles. However, parameter translation should work differently than output text translation: output texts’ language should depend on UI or page language (e.g. Commons file pages’ content depends on UI language, and MediaWiki.org help pages’ content depends on page language), while parametes’ recognition should cerainly not depend on UI language, but probably not even on page language, only on wiki language (so that e.g. pages translated using Translate’s page translation feature can embed parameter names in their non-translatable part).

About parameter translation: modules have access to the parent frame, which they often take advantage of. This means that modules directly access the parameters defined when transcluding the template, that is, the parameter name is the same in the article and the module, with no template sitting in between and translating, so these parameter names need to be translated in order to avoid English parameter names popping up in articles.

OK, yet again I have to admit that I'm not actually a big expert on modules :)

Can you give me an example of how modules can access the parameters of templates?

In any case, this particular task is only about translating strings within one wiki and not about reusing modules across wikis, although it would be nice to make it usable also for the future when modules will hopefully become sharable across wikis.

However, parameter translation should work differently than output text translation: output texts’ language should depend on UI or page language (e.g. Commons file pages’ content depends on UI language, and MediaWiki.org help pages’ content depends on page language), while parametes’ recognition should cerainly not depend on UI language, but probably not even on page language, only on wiki language (so that e.g. pages translated using Translate’s page translation feature can embed parameter names in their non-translatable part).

Oh, definitely, no doubt about that. I already wrote about it in the long Global templates document in the section "Localizing parameters", as well as the one before it. I am now writing a detailed document about translatable modules, which I'll publish very soon, and it's mentioned there, too.

As I wrote above, when templates are global, translating their titles and parameter names (separately from human-readable strings) will definitely be necessary, but I'm really not sure that making the same possible for modules is necessary. I'm open to having my mind changed, however.

When I first experimented with “global modules”, I used separate data tables on Commons for the template parameters (https://commons.m.wikimedia.org/wiki/Data:Module:Music_charts/parameters.tab) and the text output (https://commons.m.wikimedia.org/wiki/Data:Module:Music_charts/labels.tab), among others; these were called by the central module, which eventually outputs the template code. This method would actually work fine for my needs, but the performance is terrible, so I never implemented it outside of testwiki …

Can you give me an example of how modules can access the parameters of templates?

-- Get frame object
local frame = mw.getCurrentFrame()

-- Parameters passed in the {{#invoke:}} call
local args = frame.args

-- Parameters passed to the template containing the {{#invoke:}} call
local pargs = frame:getParent().args

-- ...but it doesn’t work infinitely:
frame:getParent():getParent() == nil

So if Template:Foo contains the wikicode {{#invoke:Foo|bar|x|y=z|b=a}}, and Module:Foo defines the above variables, and {{Foo|a|b=c|d=e}} is placed in an article, the variables will be (more or less):

args == {
	[1] = 'x',
	['y'] = 'z',
	['b'] = 'a'
}
pargs == {
	[1] = 'a',
	['b'] = 'c',
	['d'] = 'e'
}

Is this clear?

Can you give me an example of how modules can access the parameters of templates?

  • Parameters passed to the template containing the {{#invoke:}} call

local pargs = frame:getParent().args

Thanks, wasn't familiar with getParent()! But again, I'm not a true modules expert :)

As a guy maintaining most of the high use lua codes on multilanguage Commons wiki, I spend a lot of time dealing with translations and i18n in general. On Commons we use two already discussed systems, each with it's own advantages and disadvantages:

1) Most of the translations use the separate Lua modules that deal with translations, separating translation data fron the actual code. For example c:Module:I18n/complex_date.
Advantages:

  • flexible, as it allows pure text translations and mix of string translations with more complicated data structures or even functions when needed. See Module:I18n/complex_date code above for examples of tables with functions. They are very usefull if one of the languages require some very different treatement then the most.
  • Most types of i18n data tables can be loaded using mw.loadData function to load them efficiently
  • code and translations are visible and readable
  • Lua have access to all the translations and know which language was returned, when fallback languages have to be used.

Disadvantages:

  • each tweek of one of the languages is a code change. If a function is used on 70M pages (as some did) than 70M pages need to be updated.
  • moving the code to different wiki requires you to also copy i18n tables, resulting in dozens of out of synch translation tables.

2) approach using Commons Data namespace, championed by Module:TNT. For example c:Module:DateI18n used by 70M pages on Commons and moved to many other wikis uses it.
Advantages:

  • Single translation table on Commons is used by Lua modules on all the wikis. That is the big one.
  • updates to the table do not triger page refreshes for all the pages using it.

Disadvantages:

  • People do not understand it and there are very few requests to improve or add test in various languages
  • code and translations are hidden from view. See Data:DateI18n.tab. This makes them even harder to comprehand.
  • data format (saved as tightly controled json) only allows a simple string per language, not allowing more complicated data structures to be saved. In case of Data:DateI18n.tab there were some languages that required string to save information about different date formats used for different days of the month, requireing some complicated encoding where simple json would have been so much more readable.
  • all the comments are stripped from the code and page /doc pages are not supported making those pages hard to document.

Whatever format we end up using it should combine advantages of those two and minimize disadvantages.

Disadvantages:

  • each tweek of one of the languages is a code change. If a function is used on 70M pages (as some did) than 70M pages need to be updated.
  • moving the code to different wiki requires you to also copy i18n tables, resulting in dozens of out of synch translation tables.

These disadvantages are currently unique to exporting a module from a multilingual wiki to a bunch of monolingual wikis. By contrast, the Wikipedia versions of Module:Convert and Module:Citation/CS1 are quite complex modules but have relatively maintainable i18n submodules. I’ve repeatedly synchronized these templates with other wikis but never had to worry about any language other than the surrounding wiki’s content language, but only because the wiki’s content language is the same for every user regardless of the interface language.

Any system of globally localizable modules will need to have an answer for outdated translations, because someone mucking with a module’s implementation will speak only a limited number of languages. Anyone who has translated interface messages at Translatewiki.net or CentralNotice banners at Meta would be familiar with !!FUZZY!! and the suboptimal user experience of a partially translated interface (or an outdated Board of Trustees election date). The traditional approach of per-wiki modules doesn’t normally run into this problem, because someone synchronizing a translated module with the original version would typically update both the functionality and the translations at the same time. Whether this matters will greatly depend on the purpose of a given module, but I hope whatever translation management system we build will be approachable to users of client wikis.