Page MenuHomePhabricator

Lua memory profiling support
Open, Needs TriagePublic

Description

The English Wiktionary is repeatedly running into memory problems, causing template expansion to fail with "Lua error: not enough memory".

We have identified potential memory hogs and have optimised the underlying Lua data structures to save space. However many of these efforts are based on trial&error, it does not seem to be possible to get memory allocation information on a Module / Template level.

The wgPageParseReport/limitReport included in the HTML only contains the used/max amount:

"limitreport-memusage": {"value":52428798, "limit":52428800}

That's not very helpful. Optimising the memory usage is difficult if it can't be measured in detail.

Is there anything you can suggest to help with memory profiling?

Event Timeline

removing All-and-every-Wiktionary as this seems to be about a single Wiktionary site.

Could you provide some specific examples?

@Aklapper currently failing pages: wind, night, do

Recent Grease pit discussion: light and Lua error: not enough memory

Splitting the data module into a per-code format would, I completely agree, increase the overhead in terms of function calls, but since most pages contain very few languages, I suspect that on average it would reduce overall server resource consumption. Since it is very hard for us to profile the things we do on wiki, we will be mostly stuck guessing about these types of things.

(User:TheDaveRoss)

A relatively fast-loading, but typical, example of the resulting problem is at:

https://en.wiktionary.org/wiki/wind

A year has passed, and this bug hasn't even been triaged. The memory problems on the English Wiktionary are still there, and ugly workarounds are needed to get some pages to render without errors.

Is there anything else we could try in the meantime?

A year has passed, and this bug hasn't even been triaged.

Many bugs get fixed without ever being triaged

I'm not sure if this is feasible without negative impact in performance. The way LuaEngine works it's only possible getting the memory footprint at a given moment or the peak memory footprint of the whole execution. The only way I can think of to get the peak memory footprint for each of the #invokes would mean reinstantiating the LuaEngine each time the #invokeis called. To achieve that we should change this line to instantiate the engine each of the times and, after calling the Lua code, just store the data using the module name as a key, but the impact on performance could be more than it's worth.

I'm not sure if this is feasible without negative impact in performance.

Another option would be to add memory profiling right into the luasandbox code, along where the CPU profiling is done currently, to minimize the impact.

However there will always be some impact on performance, so an option could be to specifically enable memory profiling only for some requests, or to make it available only on developer machines. With Docker images it's now easy to run an instance with production data locally.

While researching this a bit I've also noticed a weakness in the current implementation: when the memory allocation request cannot be satisfied a garbage collection run is *not* automatically performed to free up resources. Lua 5.2 does this (it's called Emergency GC), but we're stuck with 5.1.

So even with full profiling support and optimized code there's still a chance to run into the memory limit, depending on the GC timing.

At this point the easiest "fix" seems to be to simply increase the amount of memory granted to Lua. Wiktionary is growing fast and the memory limits from a few years ago just don't work anymore.

See also T267708, T165935, T99120 for some past reports about non-performant on-wiki Lua modules creating "Lua error: not enough memory" on wiki pages.

The example link shown above is not generating the error for me, and it does not look like the Wiktionary entry is nearly long enough to reproduce what I reported in a duplicate, T267708. I guess the original form of the error may no longer be occurring.

I find you can reproduce the error consistently if you choose a word that exists in a very large number of languages, such as "i" and look at a language near the end of the alphabet, such as Welsh or Zulu. See https://en.wiktionary.org/wiki/i#Welsh This particular word is so common that the error occurs for me in all languages from Old Irish (end of article) and Old Occitan to Zulu, alphabetically.

@DaibhidhR: This task is only about adding profiling support. This task will not magically solve any issues. Please see my comment T267708 again - thanks.