Page MenuHomePhabricator

Lua error: too many language codes requested
Closed, ResolvedPublic

Assigned To
Authored By
Elitre
Dec 29 2014, 6:23 PM
Referenced Files
None
Tokens
"Love" token, awarded by Quiddity."Love" token, awarded by Whatamidoing-WMF."Love" token, awarded by Elitre.

Description

When trying to assemble the December's issue of VE multilingual newsletter,
I got this message after this edit (I guess LUA can't actually handle so many languages. If I'm right, this needs to change - 20-something languages aren't "many" at all.)

More info:

Backtrace:
[C]: in function "isRTL"
mw.language.lua:99: in function "isRTL"
mw.language.lua:132: in function "getDir"
Module:Assemble_multilingual_message:46: in function "chunk"
mw.lua:490: ?

Event Timeline

Elitre raised the priority of this task from to Needs Triage.
Elitre updated the task description. (Show Details)
Elitre added subscribers: Elitre, gpaumier, Amire80.

Thanks, Guillaume. Splitting the page doesn't seem very feasible though - one would have to split recipients' lists as well, and maintaining them... doesn't really look sustainable. Has this never occurred for Tech News?

I note the restriction was added by Tim (see Gerrit change 36496, which was the original source of Gerrit change 42050 that was actually merged), so adding him as a CC here.

Has this never occurred for Tech News?

No, we've never had so many translations in Tech News.

Aklapper triaged this task as Medium priority.Dec 29 2014, 8:06 PM

Thank you, Anomie! The VE newsletter is quite popular, so I'd really like to get rid of the restriction somehow if possible (/bragging).

This limit is also why Commons template:Dir (https://commons.wikimedia.org/wiki/Template:Dir ), which is the most transcluded template on Commons (with 32,721,515 pages transcluding it) is not using Lua's "isRTL" function.

@Johan, I gather this is a problem you may also have at some point.

This seems to go back to the issue that constructing language objects was slow and memory consuming. The most likely culprit was the preloading a big set of language data and some messages every time a language object was constructed. But I was under the impression that this has been improved enough a long time ago, so maybe the restriction in Lua is not necessary anymore. Or if it turns out there are still issues, Language object construction should be made light-weight.

Would it be feasible to raise the cap to, say, 30 languages (although that's a limit the VE newsletter has already crushed at least once) and see how it goes?

IIRC my concern was that there was no eviction from the cache in LocalisationCache::$data, which continues to be the case. If recaching is required, memory usage is about 1.5MB per language, measured locally today. So a limit of 20 implies 30MB, which seems reasonable compared to the Lua memory limit of 50MB.

However, in the WMF production setup, l10n recaching is done in advance, so there's no way recaching would be done in response to a Lua request. It only needs to load the preloaded messages plus the data actually requested (isRTL). Using eval.php on a production server, I measured 145KB per language, which is not so concerning.

So I think the options are:

  1. Fix LocalisationCache so that it doesn't use unlimited amounts of memory when recaching on demand. Implementation ideas can be found in LocalisationCacheBulkLoad.
  2. Make MAX_LANG_CACHE_SIZE configurable and set it to say 200 on WMF, leave it at 20 by default. We would probably need to add ScribuntoEngineBase::getOptions() so that Scribunto_LuaLanguageLibrary::register() can get access to the configuration options, which are stored in a protected variable.
  3. Increase the default MAX_LANG_CACHE_SIZE very modestly, say to 30. This seems like a bad choice since in a month or two someone will reopen the same bug, having found a really important use case for loading 31 languages.

Somewhat off topic, but speaking of lightweight Language object construction, I think having $wgLangObjCacheSize = 10 by default is actually bit rot, I don't think there's any reason for that anymore. Language objects used to hold message arrays, presumably that is the reason for it being so low.

Thanks Tim. What needs to be done here? Do we need to involve someone in particular in this conversation? (I'd really love to be able to deliver the next visual editor newsletter into more than 20 languages.)

Change 343590 had a related patch set uploaded (by Tim Starling):
[mediawiki/extensions/Scribunto] Make the maximum language cache size configurable

https://gerrit.wikimedia.org/r/343590

Change 343590 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@master] Make the maximum language cache size configurable

https://gerrit.wikimedia.org/r/343590

I just run into this issue again at https://commons.wikimedia.org/wiki/Module_talk:Name/testcases Maybe some functions do not need to be in the language objects, like: mw.language:isRTL or mw.language:lc.

One consequence of this error is that lua code writers avoid calling mw.language functions and use frame:callParserFunction instead. For example c:Module:Date (used on 46M pages) uses datestr = mw.getCurrentFrame():callParserFunction( "#time", { dFormat, timeStamp, lang } ) instead more logical datestr = mw.language.new(lang):formatDate( dFormat, timeStamp) . The output is the same except for no "Lua error: too many language codes requested". From performance point of view, is there a difference between mw.language.new(lang) and mw.getCurrentFrame():callParserFunction calls?

Change 430068 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Raise Scribunto maxLangCacheSize to 200

https://gerrit.wikimedia.org/r/430068

Change 430068 merged by Tim Starling:
[operations/mediawiki-config@master] Raise Scribunto maxLangCacheSize to 200

https://gerrit.wikimedia.org/r/430068

Anomie claimed this task.

Wouldn't it be nice to automatically set the value of whether manual recache is enabled or not? Hopefully things will be faster when we start hitting the 200 languages limit :)