Page MenuHomePhabricator

mw.text.jsonDecode() incorrectly interprets number strings as integers
Open, Needs TriagePublicBUG REPORT

Description

In Wikimedia Commons lua module console, type the following:

table = mw.loadJsonData("Module:Library classification navigation/NDC10/DisplayNameTable.json")
print(mw.dumpObject(table))

If the json is like:

{
	"00": {},
	"0": {}
}

It would show:

table#1 {
    metatable = table#2
    [0] = table#3 {
        metatable = table#4
    },
    ["00"] = table#5 {
        metatable = table#6
    },
}

Look, the "0" string was interpreted as integer. Is this a bug or intended?

Event Timeline

https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/master/includes/Engines/LuaCommon/TextLibrary.php#L180

} elseif ( $isEncoding && ctype_digit( $k ) ) {
  // json_decode currently doesn't return integer keys for {}
  $isSequence = $next++ === (int)$k;
} else {

I can't read PHP. ChatGPT says: In this part of the code, if $isEncoding is true and the key $k is a string containing only digits (checked using ctype_digit), it converts the string to an integer using (int)$k. This is done to handle cases where JSON decoding may result in numeric keys represented as strings. The function then checks if the converted integer key forms a sequential sequence.

Is this intended? Is this a bug? If this is intended, can you add an option to allow no conversion like mw.loadJsonData("file.json", digit_conversion=False).

Please replicate the bug with the following code in debug console in Commons:

table with keys 0 to 9:

table = mw.loadJsonData("User:維基小霸王/sandbox.json")
print(mw.dumpObject(table))

output:

table#1 {

metatable = table#2
table#3 {
    metatable = table#4
},
table#5 {
    metatable = table#6
},
table#7 {
    metatable = table#8
},
table#9 {
    metatable = table#10
},
table#11 {
    metatable = table#12
},
table#13 {
    metatable = table#14
},
table#15 {
    metatable = table#16
},
table#17 {
    metatable = table#18
},
table#19 {
    metatable = table#20
},
table#21 {
    metatable = table#22
},

}

table with keys 0 to 9, 5 removed:

table = mw.loadJsonData("User:維基小霸王/sandbox2.json")
print(mw.dumpObject(table))

output:

table#1 {

metatable = table#2
table#3 {
    metatable = table#4
},
table#5 {
    metatable = table#6
},
table#7 {
    metatable = table#8
},
table#9 {
    metatable = table#10
},
[0] = table#11 {
    metatable = table#12
},
[6] = table#13 {
    metatable = table#14
},
[7] = table#15 {
    metatable = table#16
},
[8] = table#17 {
    metatable = table#18
},
[9] = table#19 {
    metatable = table#20
},

}

The problem is actually mw.text.jsonDecode():

=mw.logObject(mw.text.jsonDecode('{"0": "zero", "00": "two zeroes"}'))
table#1 {
    [0] = "zero",
    ["00"] = "two zeroes",
}

I'm not sure whether the current behavior is expected or not, I agree it's non-ideal, but it predates mw.loadJsonData().

Legoktm renamed this task from mw.loadJsonData incorrectly interprets number strings as integers to mw.text.jsonDecode() incorrectly interprets number strings as integers.Feb 23 2024, 6:26 AM
Legoktm unsubscribed.

Is this intended? Is this a bug? If this is intended, can you add an option to allow no conversion like mw.loadJsonData("file.json", digit_conversion=False).

mw.text.jsonDecode supports flags as the second argument, like mw.text.jsonDecode('{"0": "zero", "00": "two zeroes"}', mw.text.JSON_PRESERVE_KEYS).
But the output seems to be the same...

The documentation states:

Normally JSON's zero-based arrays are renumbered to Lua one-based sequence tables; to prevent this, pass mw.text.JSON_PRESERVE_KEYS.

Shouldn't the 0-to-1 conversion limited to int values? I believe that strings, even if they represent numbers, should not be converted at all.

A key in a JSON object is always a string. Therefore a number is impossible. However a Lua table needs to be a JSON object if any not one-based array shall be mapped.

Only 1-based arrays can be mapped directly as JSON array ↔ Lua sequence table.

A Lua table (mapping object) is permitted to use any data type as key, even boolean and floating point numbers. Even worse, also a table. And all data types may be mixed as keys within one table.

If you have a Lua object (table) with int keys 0, 1, 2 you need to convert these keys in JSON as "0", "1", "2". On backward conversion you have the choice in Lua whether you want string keys or number keys.

A key in a JSON object is always a string. Therefore a number is impossible. However a Lua table needs to be a JSON object if any not one-based array shall be mapped.

Only 1-based arrays can be mapped directly as JSON array ↔ Lua sequence table.

A Lua table (mapping object) is permitted to use any data type as key, even boolean and floating point numbers. Even worse, also a table. And all data types may be mixed as keys within one table.

If you have a Lua object (table) with int keys 0, 1, 2 you need to convert these keys in JSON as "0", "1", "2". On backward conversion you have the choice in Lua whether you want string keys or number keys.

How to convert a key in JSON as "0"? I modified key in the json as "'0'" and "\"0\"", but neither works. The quote and escape character would show up.