Page MenuHomePhabricator

mw.LoadData results don't properly handle # operator
Closed, InvalidPublic

Description

When data is accessed through mw.LoadData, and that data contains nested tables, the # operator (which is approximately but not exactly table length) does not work correctly.

For example:

data = mw.LoadData( 'Module:MyData' );

Then asking for

#(data.sub_table)

Appears to be consistently wrong. I've worked up an example that seems to like to show 0 consistently:

http://test2.wikipedia.org/wiki/Module_talk:LoadDataError

Experimenting with more complicated versions of this (i.e. tables nested inside tables nested inside tables) led to even worse behavior that seemed to point to uninitialized memory being returned, i.e. random values. It also managed to trigger the Wikimedia Foundation Error Message in some cases with complaints about out of memory errors. I'm not really sure why.


Version: unspecified
Severity: normal

Details

Reference
bz47096

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:27 AM
bzimport added a project: Scribunto.
bzimport set Reference to bz47096.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

When data is accessed through mw.LoadData, and that data contains nested
tables, the # operator (which is approximately but not exactly table length)
does not work correctly.

This is documented: [[mw:Extension:Scribunto/Lua reference manual#mw.loadData]].

Unfortunately, Lua 5.1 doesn't allow overriding the '#' operator on tables.

Experimenting with more complicated versions of this (i.e. tables nested
inside
tables nested inside tables) led to even worse behavior that seemed to point
to
uninitialized memory being returned, i.e. random values. It also managed to
trigger the Wikimedia Foundation Error Message in some cases with complaints
about out of memory errors. I'm not really sure why.

If you can reproduce that, that would be much more interesting.

(In reply to comment #1)

(In reply to comment #0)

When data is accessed through mw.LoadData, and that data contains nested
tables, the # operator (which is approximately but not exactly table length)
does not work correctly.

This is documented: [[mw:Extension:Scribunto/Lua reference
manual#mw.loadData]].

Unfortunately, Lua 5.1 doesn't allow overriding the '#' operator on tables.

http://www.lua.org/manual/5.1/manual.html#2.8

"len": the # operation.

So wouldn't it be a matter of adding __len to the metatable, or is there some reason this can't be used?

(In reply to comment #2)

(In reply to comment #1)

Unfortunately, Lua 5.1 doesn't allow overriding the '#' operator on tables.

http://www.lua.org/manual/5.1/manual.html#2.8

"len": the # operation.

So wouldn't it be a matter of adding __len to the metatable, or is there some
reason this can't be used?

__len doesn't work on strings or tables in 5.1.

Okay, so it appears __len can't be fixed. That's a shame.

I think the wild behaviors and memory errors I saw were probably an inadvertent consequence of tripping over bug 47300 and accidentally overwriting things in the parent space, so I don't think there is anything else to address here.

Hmm. We could really use a "RESOLVED CANTFIX" status here.

darklama wrote:

There are four ways this could be addressed:

  1. Use a Lua 5.2 interpretor
  2. Use a Lua interpretor that backports features like this, such as LuaJIT compiled with -DLUAJIT_ENABLE_LUA52COMPAT http://luajit.org/extensions.html
  3. Patch the Lua interpretor. https://github.com/dubiousjim/luafiveq/blob/master/patches/table-len.patch
  4. Add table.len function which uses __len when available, or relies on the # operator otherwise, and encourage people to use the table.len function instead.

(In reply to comment #6)

  1. Use a Lua 5.2 interpretor

That's probably what will happen, eventually. But there's no timetable, and in particular the changes to function environment handling mean that Scribunto's sandboxing will need rewrite and re-review.

  1. Use a Lua interpretor that backports features like this, such as LuaJIT compiled with -DLUAJIT_ENABLE_LUA52COMPAT http://luajit.org/extensions.html

The luasandbox PHP extension does not work correctly with LuaJIT at the moment.

  1. Patch the Lua interpretor.

https://github.com/dubiousjim/luafiveq/blob/master/patches/table-len.patch

Not going to happen, since we want people to be able to use the stock Lua interpreter with LuaStandalone.

  1. Add table.len function which uses __len when available, or relies on the # operator otherwise, and encourage people to use the table.len function instead.

We don't want to be extending Lua's standard libraries, in case e.g. standard Lua adds an incompatible table.len. This is why we don't add string.usub and the like as aliases for the mw.ustring functions, either.