Commas at the end of object definitions aren't allowed, and are rejected on ingestion by the TD GUI but not on save by the TD PHP when running on HHVM without this compile flag (Zend runs it fine).
Description
Related Objects
- Mentioned In
- T232954: Inconsistent detection of JSON errors in templatedata
T214984: PHP7's stricter JSON parsing breaks some wiki content
T207523: JsonContent incorrectly reports invalid JSON is valid, only on HHVM
T184694: mw.text.jsonDecode throws Syntax Error exception on previously accepted TemplateData
T152090: Template Data allowing invalid JSON to be saved. - Mentioned Here
- T176370: Migrate to PHP 7 in WMF production
T214984: PHP7's stricter JSON parsing breaks some wiki content
T107132: Incorrect JSON decoding in HHVM 3.6.5+dfsg1-1+wm1
T74778: Unit tests in FormatJsonTest fail
T103346: phpunit hhvm failure: LuaSandbox: TextLibraryTests[87]: json decode, invalid values (trailing comma)
Event Timeline
Per https://github.com/facebook/hhvm/blob/master/hphp/runtime/ext/json/JSON_parser.cpp#L715 the flag is json_utf8_loose. I'm not sure whether the UTF8->UTF16 transform at https://github.com/facebook/hhvm/blob/master/hphp/runtime/base/utf8-decode.cpp#L150 which this also affects is a bad outcome for us, though?
My gut feeling on this is that everything in MW is already supposed to be UTF-8, and invalid UTF-8 already causes exceptions etc, so not tolerating it in json_decode() should be OK. In fact, I don't think Zend PHP tolerates invalid UTF-8 in JSON either, pretty sure we've already encountered that before.
Apparently this would also fix T103346: phpunit hhvm failure: LuaSandbox: TextLibraryTests[87]: json decode, invalid values (trailing comma).
It also appears to support single quotes: https://en.wikipedia.org/w/index.php?title=Template:Multiple_image/doc&oldid=695464749
I started looking at this upstream earlier in connection to T103346: phpunit hhvm failure: LuaSandbox: TextLibraryTests[87]: json decode, invalid values (trailing comma). One tricky thing is that there are actually two JSON stacks, one using the JSONC library (binding at jsonc_parser.cpp), and one using an in-house HHVM one (depending on how it's compiled) (JSON_parser.cpp).
Does anyone know which we're using?
Also, should we merge these two tasks?
The reason for having two JSON stacks has its roots in a now-infamous clause in the JSON license: "The Software shall be used for Good, not Evil." This makes the license non-free in the eyes of Google, GNU, Fedora, Debian, and Red Hat Legal. Douglas Crockford is aware of the amount of pain this has caused and he doesn't care. It's some kind of license griefing. There's a section about this issue on en:Douglas Crockford.
This was not an issue for Facebook, but it was an issue for Debian. Facebook did not want to switch something as fundamental as the JSON parsing code in their core application runtime, so the compromise was to use either JSON or JSON-C, based on a compile-time flag. The person who actually implemented this is none other than our very own @EBernhardson.
The bit (literally!) that seems to be causing us problem is the commented-out code on lines 225-227 of hphp/runtime/ext/json/jsonc_parser.cpp:
//if (!(options & k_JSON_FB_LOOSE)) { // json_tokener_set_flags(tok, JSON_TOKENER_STRICT); //}
What this code does (or rather: would be doing, if it were not commented-out) is set the JSON-C JSON_TOKENER_STRICT flag unless the non-standard JSON_FB_LOOSE flag was set in the $options parameter for json_decode(). From this snippet, I gather that JSON-C's default behavior is nonstrict, which is what we're seeing.
Why is this code commented out? I don't know (or don't remember). Maybe Erik can shed some light.
By the way, it's not just HHVM that suffers from this licensing nightmare; see PHP bug 63520 (watch out for guest appearances by some MediaWiki personages). The third comment from the bottom talks about some of the bugs that this has caused.
I took a look back over the code I submitted but I can't remember right now why that bit of code was commented out. It might have been to do with passing the existing test cases, but that's just a guess. In the 2 years it seems to have fallen out of my head...
My guess is that you were considering the possibility of replacing JSON with JSON-C entirely, by checking how closely JSON-C's behavior in non-strict mode matched JSON w/JSON_FB_LOOSE modifications.
At any rate, I built HHVM with that code un-commented, and saw that indeed it made json_decode() strict on trailing commas and single-quotes, matching Zend's behavior. And all the tests pass. So, unless you think there is a reason not to, we could simply submit that upstream.
As a temporary workaround, you may be able to do something like this:
<?php function isValidJson( $json ) { // Strip unicode escapes representing double quotes and commas $json = preg_replace( '/\\\u002[2c]/', '', $json ); $decoded = json_decode( $json ); if ( json_last_error() !== JSON_ERROR_NONE ) { return false; } $encoded = json_encode( $decoded ); return preg_match_all( '/[",]/', $json ) === preg_match_all( '/[",]/', $encoded ); } assert( isValidJson( '{"abc": "\u0022\u002c"}' ) ); assert( isValidJson( '{"abc": "def", "ghi": "jkl"}' ) ); assert( isValidJson( '{"a\"bc": "d,ef", "ghi": "jkl"}' ) ); assert( !isValidJson( '{"abc": "def", "ghi": "jkl",}' ), 'trailing comma' ); assert( !isValidJson( "{'abc': 'def', 'ghi': 'jkl'}" ), 'single quotes' );
Did anyone check which stack we're using in production? Jdforrester-WMF's comment is regarding JSON_parser, and Ori's is regarding jsonc_parser.
Also, I had a conversation on IRC (in hhvm on Freenode) with Orvid_, about making the JSON-C parser (and possibly the builit-in) strict. They suggested using a runtime option (INI setting) to control strictness. I haven't followed up, so I don't know if this is the broad consensus.
prod uses jsonc, i ported it from the jsonc package in pecl and upstreamed it to hhvm. Adding an ini is relatively easy and i can put that together if desired
I was able to save something with multiples of the same key, is this the same issue? https://es.wikipedia.org/w/index.php?title=Plantilla%3ACita_noticia%2Fdoc&type=revision&diff=95489314&oldid=95489307
Another json validation issue, it's very hard to see in this diff, but there was an extra closing } in the previous change and it was also allowed to be saved: https://en.wikipedia.org/w/index.php?title=Template:Citation/doc&diff=prev&oldid=753664720
Another complaint about this here: https://www.mediawiki.org/wiki/Topic:V6u2mpy11qiu9705