Page MenuHomePhabricator

Allow users to code in localized programming languages
Closed, DeclinedPublic

Description

This is the ticket created to follow the idea of allowing users to code in localized programming languages.

  • Problem: While a way to share common libraries of templates and Scribunto modules, users should also be allowed to code in their own language.
  • Who would benefit: Any contributor desiring to code in a non-English project.
  • Proposed solution: Enabling user to localize keywords of templates and modules (<tt>if, else, while…</tt>). Possibly this may go through translatewiki, and surely it may benefit the idea of a T129088.
  • More comments: The few (reserved) keywords that compose a programming language isn't a big deal, but because they are (usually) in English, their is a huge tendency to name variables (including functions) and add comment in English too. The consistency of this "all in English" code tacit guideline will depends on English level of contributors. More importantly, that may rising up the difficulty for beginning to contribute on local templates and modules. As our community is trying to widespread it's technical contributors, this may lower the barrier for first contributions. Note that this is '''not''' incompatible or against having T52329, just like having Wikidata or Commons doesn't prevent locale policies and specific ways to handle things.

Event Timeline

This sounds like a nightmare to maintain, especially given the general mess that are non-centrally-maintained (or otherwise forked) scripts.

Reedy triaged this task as Lowest priority.Nov 10 2016, 7:59 PM
Reedy added a project: Scribunto.
Reedy subscribed.

Plus we'd need some sort of "parser"/transform ontop of Scribunto/Lua to translate the various keywords, before it even could be run through lua specific stuff

Which will make debugging harder, and maintenance harder

People are of course free to leave multilingual comments, and also use local variable names

Anomie subscribed.

It would be insane trying to copy modules between wikis if this sort of thing were in use, or else no one would be able to use it if they had any hope of any other-language wiki using the module.

Besides keywords, you'd presumably be wanting to replace all the built-in functions' and tables' names, the magic metatable method names, the type names returned from type(), the field names accepted by os.time() and returned by os.date( '*t' ), and so on. That makes it a much larger job than just a few keywords. You might also run into other crazy issues like how to interpret < in RTL languages.

Lingua::Romana::Perligata is a neat hack, but we're definitely not going to do try to something like that for Lua, and especially not for every language we have a wiki for.

@Krenair the local templates and modules maintenance are the responsibility of local communities. I don't think that on this regard this would add much to the current maintenance "nightmare", while it may add attraction to local technical contributions.

@Reedy the idea would be more to modify the Lua interpreter. In fact, I already have done that for Lua 5.3, with the Lupa project. My original idea was to only translate it to Esperanto, but following Luiz Henrique de Figueiredo conceils and guidance, I'm now planning a more modular approach, so it may benefit other translations initiatives. I think ideally one should be able to change some kind of namespace to switch, like for Babylscript.

@Anomie you are completely right with the project to translate the built-in functions and so on. I aware that it's a lot more than just a few keywords, and this project is just an outcome from a the broader plan to be able to code in Esperanto-based programming languages. As you can see from the error message text and standard objects of Javascript I translated for the Babylscript project, I'm not afraid of larger material. Moreover I can capitalize on what I already translated. :)

For Perligata, it's in my plan to translate it to Esperanto as a research project on Wikiversity. But currently translation tools aren't available on beta.wikiversity, see T148947, and I want to focus more on Lupa for now.

On a more general perspective, I don't think the amount of translation needed should be a blocker here, just like the amount of Mediawiki messages on translate wiki isn't a blocker to launch Wiki in misc. languages. The idea is not to remove support of the already existing tokens, but add ability to translate them, and if they are missing translations, the default tokens could always used as fallback.

It's not the size of the translation job that's a concern for me, it's the added complexity to the code to *allow* everything to be translated.

And there's also the fact that if someone on dewiki writes a module in German, then someone on frwiki wants to copy that module, they'd need to translate it from German to either English or French. Or else every wiki would allow every language, and who knows how much trouble that might cause when someone starts writing modules in languages no one else on the wiki speaks or when someone starts writing modules in some crazy pidgin.

Note that parser functions already are localized, despite this meaning that they can't be carried across wikis. Lua could be both localized and transwikiable, given that code editing is already in a separate type of environment. Lua was supposed to be the successor to the piles of parser functions, but if it can only be edited by English speakers, then the many communities may need to fall back to parserfunctions.

Re implementation of localized Lua: I'd recommend storing the modules in "English" (standard Lua), but having the code display and editing window by default translate all keywords to the local language, with a button to bring it back to "English".

Has anyone looked into whether localized parser function names has, on the whole, been a good or bad thing? Particularly with respect to copying templates from one non-English wiki to another (copying from enwiki doesn't count)?

I don't think trying to transpile modules to all sorts of different languages is much simpler. Say someone writes a module in English and names a variable relating to paragraphs "para". Then you want to transpile it into Spanish, where "para" is a reserved word corresponding to "for" (as is apparently the case for this transpiler for C++ and Python). What happens? Or if someone is writing in Esperanto and tries to name a variable "for" which Wiktionary tells me means "away, far, gone" in that language?

And if we solve that problem, do we also have to figure out how to translate error messages such as "'<name>' expected near 'for'. " into every language especially when "for" doesn't actually appear in the transpiled text being shown to the user?

I doubt that the use of English for keywords and function names is the biggest barrier for someone who doesn't speak English learning to program in Lua. Yes, they'll have a harder time since the 21 keywords and the various standard library function names don't map to existing words in their native language. But if they've done programming in C, Java, JavaScript, Python, or many other common languages they're probably already familiar with almost all of the keywords and at least some of the library function names.

I suspect a much larger barrier is the fact that the reference manual has not been translated into very many other languages, so how do they begin to learn what any of the keywords or functions do?

It's not the size of the translation job that's a concern for me, it's the added complexity to the code to *allow* everything to be translated.

Well, the biggest issue seems to be unicode support by the Lua interpreter. In fact Luajit do already have this support, but I have no idea what the performance difference is and whether it would fit WM environment requirements. Concering the amount of mofication necessary, here is a bit of the feedback I got from Luiz Henrique de Figueiredo, sended along the file

.

Here is what I had I mind for a token filter in C. This piece of C code
centralizes all needed changes. Just add <<#include "proxy.c">> just
before the definition of luaX_next in llex.c. That's the only change in
the whole Lua C code.

This filter works for this simple test.

self=9
print(self)
sia=99
print(self)
x=1
y=2
print(x==y)
y=x
print(x egalas y)

Enjoy.
--lhf

And there's also the fact that if someone on dewiki writes a module in German, then someone on frwiki wants to copy that module, they'd need to translate it from German to either English or French.

I don't see from this perspective. First, security reason apart, it would better to have ability to have have cross site call, so for little borrowing one might just call the function in the other : no translation needed. For module which are of more broader interest, we should have a common repository, and things are already on the way for that, as far as I know.

Or else every wiki would allow every language, and who knows how much trouble that might cause when someone starts writing modules in languages no one else on the wiki speaks or when someone starts writing modules in some crazy pidgin.

Well, what's the difference with people beginning to write articles in languages no one else on the wiki speaks or when someone starts writing articles in some crazy pidgin? The community will just get rid of it.

I don't see from this perspective. First, security reason apart, it would better to have ability to have have cross site call, so for little borrowing one might just call the function in the other : no translation needed. For module which are of more broader interest, we should have a common repository, and things are already on the way for that, as far as I know.

That's T52329

Has anyone looked into whether localized parser function names has, on the whole, been a good or bad thing? Particularly with respect to copying templates from one non-English wiki to another (copying from enwiki doesn't count)?

As said, I think this "copying template" is not a great practice. At least for Wikimedia scoped instances of Mediawiki, users should be able to make cross site call. Otherwise, just like for articles, when there is no equivalent counterpart on the Wiki translation might be a good starting point, should divergences appear thereafter to reflect locale specificity.

I don't think trying to transpile modules to all sorts of different languages is much simpler. Say someone writes a module in English and names a variable relating to paragraphs "para". Then you want to transpile it into Spanish, where "para" is a reserved word corresponding to "for" (as is apparently the case for this transpiler for C++ and Python). What happens? Or if someone is writing in Esperanto and tries to name a variable "for" which Wiktionary tells me means "away, far, gone" in that language?

In Babylscript this problem is solved using translated names. This shouldn't be to hard to implement in native Lua using metatables tricks, should it?

Now, admittedly, as keywords themselves aren't stored in a metatable, that make their cases more complicated, but for everything related to variable and function names, I think that the previous suggestion give the mainline of how to deal with the problem.

For the anecdote, in Esperanto you wouldn't name a variable "for", because substantives all end with "-o[j][n]" (with optional letters marking plural and accusative respectively). Well actually, to be exhaustive, the nominative singular can replace the final "-o" with an apostrophe, which is a feature mainly focused at poetry, but providing that you would overcome the Unicode support, it would be easy to use "’" (U+2019) .

I already began to implement "syntactic sugar" in mallupa, which compile from the from Esperanto based lexical code to Lua. It, inter alia, capture "o[n]$" <name>s and return the apocoped root. That enable to have things like variablo egalas variablon translated to the Lua variabl == variabl code. This approach is easier to implement, the main peace of code is a Lua script, but it comes with all the inconvenient of adding a layer between what you write and what is interpreted by the Lua VM.

And if we solve that problem, do we also have to figure out how to translate error messages such as "'<name>' expected near 'for'. " into every language especially when "for" doesn't actually appear in the transpiled text being shown to the user?

Well, that really depend on which approach you are attempting to develop. It makes some sense in an approach like Lupa, but this just add more difficulty to find your error with an approach like Mallupa.

I doubt that the use of English for keywords and function names is the biggest barrier for someone who doesn't speak English learning to program in Lua.

Maybe not the biggest, but it doesn't mean it is not an eliminable part the equation, even if it is not the weightiest part of it.

Yes, they'll have a harder time since the 21 keywords and the various standard library function names don't map to existing words in their native language. But if they've done programming in C, Java, JavaScript, Python, or many other common languages they're probably already familiar with almost all of the keywords and at least some of the library function names.

But not our contributors have already programmed in any programming language, isn't it the whole point of lowering the barrier? Their are people out there who are even shy when it comes to edit plain old text in their native language(s), so one may guess that editing Modules is less engaging when they can't use related to their native language(s).

I suspect a much larger barrier is the fact that the reference manual has not been translated into very many other languages, so how do they begin to learn what any of the keywords or functions do?

I do agree, and as far as I'm concerned, translating Scribunto documentation to Esperanto is part of the plan. I wish I could do more in less time and have resources to dedicate for that, but I'm just a mere human being. :P

Note that parser functions already are localized, despite this meaning that they can't be carried across wikis. Lua could be both localized and transwikiable, given that code editing is already in a separate type of environment. Lua was supposed to be the successor to the piles of parser functions, but if it can only be edited by English speakers, then the many communities may need to fall back to parserfunctions.

Are you sure that parser functions are localized? The magic keywords are, but in the parserfunctions extension, I see no way to use translated keywords for the parser in the repository. The only localized things I found was error messages.

Note that parser functions already are localized, despite this meaning that they can't be carried across wikis. Lua could be both localized and transwikiable, given that code editing is already in a separate type of environment. Lua was supposed to be the successor to the piles of parser functions, but if it can only be edited by English speakers, then the many communities may need to fall back to parserfunctions.

Are you sure that parser functions are localized? The magic keywords are, but in the parserfunctions extension, I see no way to use translated keywords for the parser in the repository. The only localized things I found was error messages.

I don't know where the translations are stored, but they definitely work. Try {{#se:1|2|3}} on eowiki, or {{#זמן:Y}} on hewiki, and they work exactly as expected.

I agree with most of T150417#2791161.

In general, I think tasks like this can be useful to suss out and articulate reasons for particular design decisions, such as supporting localization or not. It can be very difficult for even long-time MediaWiki users to explain if and why these are localizable:

  • #REDIRECT [[foo]]
  • {{name of template}}
  • {{template:name of template|implicitly numbered arg}}
  • {{name of template|with_named_args=yeah}}
  • {{CURRENTWEEK}}
  • {{DEFAULTSORT:key}}
  • {{#titleparts:}}
  • {{#expr:2+2}}
  • __NOINDEX__
  • {{#invoke:module name}}
  • function str.sub( frame )

Which is code? Which is wikitext syntax? Does the code v. wikitext distinction matter when determining whether we support localization?

I think tasks (or wiki pages or wherever) are useful and valuable for the purpose of fostering this kind of design documentation and discussion. In this context, though it didn't matter much in this particular case, I think we should be wary of quickly marking even unreasonable or insane tasks as declined. It can be a bit off-putting to see a silly idea shot down so quickly, in my opinion.

I asked Tim for his thoughts on this in #wikimedia-tech. Paraphrasing him, he said that within MediaWiki, we've tried to make lots of things localizable, but that idea with Lua was to adopt an existing language. For code, the standard solution is to have keywords and other core parts of the language in English, while comments, variables, and other local identifiers are in the local language.

Anecdotally, he noted that many of the programming languages we have today have inventors that didn't speak English as a first language, but they all used fixed English words.

In T150417#2897934, @MZMcBride, paraphrasing @tstarling, wrote:

Anecdotally, he noted that many of the programming languages we have today have inventors that didn't speak English as a first language, but they all used fixed English words.

People who invent programming languages are a specialized subset of programmers generally. I think Psychoslave's argument was that we want to appeal to as many programmers as possible and providing a programming language in their own script and language would arguably be easier for them to learn. On the other hand, using a standard programming language like Lua may be better, though it's much easier for me to say that when I speak and write English natively and have all the necessary keys on my keyboard already.

To your point about older programming languages, they may not be the best predictor of languages and practices to come. The adoption and widespread use of Unicode and UTF-8, along with incrementally better input devices and operating system support for more "exotic" scripts, may have some impact on design decisions made by future language inventors.

Thank you @MZMcBride for taking time to invistigate and report about this topic. For complete reference it seems you are talking about what's logged in 20161222.txt within #wikimedia-tech.tar.gz.

I asked Tim for his thoughts on this in #wikimedia-tech. Paraphrasing him, he said that within MediaWiki, we've tried to make lots of things localizable, but that idea with Lua was to adopt an existing language.

Yes, and regarding Lua, my ideas are :

  • extending the official implementations through Lua-i18n: the idea here is to resolve the different issues that prevent localization tasks in the official Lua compiler, with minimum modification of its sources.
  • creating source-to-source compiler, to generate Lua code from localized "Lua dialects". That possibly could be used on a toollab service which would enable to push/pull Scribunto modules, for example.

For code, the standard solution is to have keywords and other core parts of the language in English, while comments, variables, and other local identifiers are in the local language.

Is it, or should we, document that on some Mediawiki development guideline?

Anecdotally, he noted that many of the programming languages we have today have inventors that didn't speak English as a first language, but they all used fixed English words.

Well, that's at least the case for Lua. :)

Now my point is more to let people contribute in localized language versions for localized problem, if they wish. Also having localized API might make sense from this point of view, just like you translate File: and so on.

Related: https://mako.cc/copyrighteous/scratch-localization-and-learning (I think this is most relevant for our training/mentoring programs).

Okay, I know this was closed as declined, but the problems mentioned seem really easy to solve. Variable naming conflicts with keywords can be handled the same way any decent transpiler handles them (adding/removing underscores as necessary to avoid conflicts), and importing/exporting and cross-wiki maintenance can be handled by a "base" form in English not normally visible but easily accessible. Here's a working English-to/from-other Lua translator: https://gist.github.com/YairRand/e22aded969e6de8cbb283e62868153a1 .

Wow @Yair_rand thanks for the link to the Lua translator, that's really cool. 😄