Page MenuHomePhabricator

Clarify handling of integer array indexes in Scribunto and php-luasandbox
Closed, ResolvedPublic

Description

PHP uses 32- or 64-bit integers for array indexes, while Lua uses double-precision floating point for its number type. These types don't have the same ranges, but the existing code doesn't take this into consideration.

Further, Lua considers table[0] and table["0"] to be distinct indexes, while PHP only wants to work with $array[0]. While php-luasandbox or HHVM can create arrays that have a string-valued numeric key like '0' or '42', PHP won't be able to access it properly.

  • When passing an array from 64-bit PHP to Lua,
    • Array indexes less than -2^53 or greater than 2^53 will wind up with precision loss and might collide.
  • When passing an array from Lua to PHP,
    • LuaStandalone will return indexes greater than 2^31-1 or less than -2^31 to PHP as strings. This is fine for 32-bit PHP, and 64-bit Zend PHP converts them back to integers. But 64-bit HHVM will get an array with inaccessible keys.
    • LuaStandalone returns indexes with absolute value 1e14 or greater in exponential format, while php-luasandbox continues to return them as integers.
    • php-luasandbox will happily try to use indexes outside of PHP's integer type's range, which seems to result in PHP seeing 0. This might differ based on how the compiler handles overflow in double-to-int conversions.
    • If Lua returns a table with string keys that look like integers to PHP, in php-luasandbox and LuaStandalone on HHVM these will be inaccessible on the PHP side. In LuaStandalone on Zend PHP they'll collide with the corresponding actual integer keys.

As for fixing this, my thoughts are:

  • PHP→Lua should stringify any integer indexes outside of an appropriate range. 32-bit PHP will already have done this using the range of signed 32-bit values. Should we use the same range for 64-bit PHP, or go bigger?
  • Lua→PHP should convert strings to integers in the same cases PHP does (see ZEND_HANDLE_NUMERIC() for php-luasandbox). Collisions should raise an error, much as already happens if Lua tries to pass PHP a table with keys of types other than integer or string. And LuaStandalone should use string.format( '%d' ) instead of tostring() for numbers with absolute value less than 2^63.

Event Timeline

Although, I might decide to take advantage of Zend PHP's unserialize() doing the right thing when given stringified integers as keys and let HHVM continue to be broken in this edge case until we drop HHVM support.

Change 408700 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/php/luasandbox@master] Sanify handling of array keys

https://gerrit.wikimedia.org/r/408700

Change 408703 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Scribunto@master] Sanify handling of array keys

https://gerrit.wikimedia.org/r/408703

Change 408700 merged by jenkins-bot:
[mediawiki/php/luasandbox@master] Sanify handling of array keys

https://gerrit.wikimedia.org/r/408700

Change 408703 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@master] Sanify handling of array keys

https://gerrit.wikimedia.org/r/408703

I'm looking at https://github.com/wikimedia/mediawiki-php-luasandbox/commit/122132a8a00fc373d359cf6beb831b5fdc040e72 and I don't see how the else blocks are correct.

Specifically: lines 249 and 280 in data_conversion.c are now "else" without any braces.

It's exactly equivalent to "else if". It just has a newline rather than a space between the two keywords.

@Anomie: you are totally right, sorry, I misread what the block was trying to protect.