Page MenuHomePhabricator

PHP7's stricter JSON parsing breaks some wiki content
Open, Needs TriagePublic

Description

Summary: HHVM's JSON parsing is lax in some respects: it allows trailing commas in arrays and objects; allows single quotes around strings; ignores trailing text in some cases; and allows control characters U+0001–U+001F to appear unescaped in strings, which includes tab (U+0009, \t), carriage return (U+000D, \r), and linefeed (U+000A, \n).

Templates or modules generating JSON with these errors for things like <mapframe> and <graph> will be accepted under HHVM but will be (correctly) rejected under PHP7. Such templates/modules should be fixed to produce valid JSON.

We've had various tasks over the years pointing out this HHVM laxity, including T74778, T103346, T128029, T184694, and T207523.


Sorry if this is not the right way to go about doing this. I have recently been adding the Highway system OSM map template to many pages without highway maps. This template basically displays a map with the red line being the highway the page is about and the green line being the highways that are part of the same system.
Normally, it renders like this:


However, in PHP7 it displays this error:

(Plaintext: <mapframe> Couldn't parse JSON: Syntax error).
What confuses me is why this new version of PHP would affect any template or Lua module. Even though this issue is specific to this one template so far, it seems rather odd that the new implementation of PHP would mess with any of the template and module code.

Steps to Reproduce:

  1. Log into https://en.wikipedia.org.
  2. In the "Beta" tab, make sure PHP7 is checked.
  3. Go to a page using the Highway system OSM map template (in this example, the Doto Expressway page).
  4. Purge the page to erase the cache, or preview the page in Edit mode, then you will see this bug.

I am using Firefox 64. This issue has happened on every page with this template whenever I have PHP7 turned on.

Thank you in advance for helping.

Event Timeline

MSG17 created this task.Jan 30 2019, 7:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 30 2019, 7:07 PM

This is probably because HHVM accepts slightly invalid input JSON (trailing commas, maybe other things too), which I guess PHP7 doesn't.

A similar problem occurs on frwiki on page w:fr:Frontière entre le Honduras et le Mexique.

I activated the PHP7 option in beta.

The following error message appears in plaintext on article:

Erreur Lua : mw.text.jsonDecode: Erreur de syntaxe

If I purge the page with PHP7 enabled, this version of the page is cached, and this caching version is also served (with the error message) if not connected (testing in private browsing mode on Firefox).

(sorry for my bad English)

Poking at this carefully: afaict the wikidata query that gets run is https://query.wikidata.org/#SELECT%20%3Fid%20%3Flength%0A%20%20%28if%28%3Fid%20%3D%20wd%3AQ867944%2C%20%27%23C12838%27%2C%20%27%2307c63e%27%29%20as%20%3Fstroke%29%0A%20%20%28concat%28%27Line%20length%3A%20%27%2C%20str%28%3Flength%29%2C%20%27%20km%27%29%20as%20%3Fdescription%29%0A%20%20%28if%28BOUND%28%3Flink%29%2C%0A%20%20%20%20%20%20concat%28%27%5B%5B%27%2C%20substr%28str%28%3Flink%29%2C31%2C500%29%2C%20%27%7C%27%2C%20%3FidLabel%2C%20%27%5D%5D%27%29%2C%0A%20%20%20%20%20%20%3FidLabel%29%0A%20%20%20as%20%3Ftitle%29%0AWHERE%20%7B%20%20%0A%20%20%7B%3Fid%20wdt%3AP16%20wd%3AQ5228578.%7D%0A%20%20OPTIONAL%20%7B%20%3Fid%20wdt%3AP2043%20%3Flength.%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Alanguage%20%27en%27%20.%0A%20%20%20%20%3Fid%20rdfs%3Alabel%20%3FidLabel%20.%0A%20%20%7D%0A%20%20OPTIONAL%20%7B%3Flink%20schema%3Aabout%20%3Fid.%0A%20%20%3Flink%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%7D%0A%7D%20GROUP%20BY%20%3Fid%20%3Flink%20%3FidLabel%20%3Flength%0A
I have gotten this by plugging in Q5228578 for the whole if clause with highway_system_qid and {{wikidata|property|raw|P16}}, Q867944 for {{wikidata|label|raw}}, (grabbed this value by evaluating the templatthe pipe on a preview copy of the Dōtō Expressway article), and by replacing {{!}} with the pipe symbol. But the result gives a list of all the expressways in Japan, not just the one we want. Did I get the query right? Surely this is not what's desired.

MSG17 added a comment.Feb 6 2019, 8:27 PM

@Jdforrester-WMF - That makes sense, so I guess the next step is to look at how to get the JSON and modify the template code to make it output correctly.

@ArielGlenn - The template is supposed to get every expressway in Japan, since it is meant to make a map of the entire system.

TheDJ added a subscriber: TheDJ.EditedFeb 7 2019, 10:49 AM

Fixed one trailing comma.

But this exposed that the template isn't escaping the string before putting it into json. real line endings are not possible inside JSON strings and should be escaped with \n.
This should be reimplemented using lua and mw.text.jsonEncode()

This is what Special:ExpandTemplates currently produces for this template on that page (after my trailing comma fix).

<mapframe zoom="3" align="center" text="Map" height="200" width="300">{
  "type": "ExternalData",
  "service": "geoline",
  "properties": {
    "stroke-width": 2
  },
  "query": "

SELECT ?id ?length
  (if(?id = wd:Q867944, '#C12838', '#07c63e') as ?stroke)
  (concat('Line length: ', str(?length), ' km') as ?description)
  (if(BOUND(?link),
      concat('[[', substr(str(?link),31,500), '|', ?idLabel, ']]'),
      ?idLabel)
   as ?title)
WHERE {  
  {?id wdt:P16 wd:Q5228578.}
  
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language 'en' .
    ?id rdfs:label ?idLabel .
  }
  OPTIONAL {?link schema:about ?id.
  ?link schema:isPartOf <https://en.wikipedia.org/>.}
} GROUP BY ?id ?link ?idLabel ?length

"}</mapframe>
Waddie96 added a subscriber: Waddie96.EditedFeb 7 2019, 11:07 AM

The Template:OSM Location map is showing "Syntax error" as an output for all instances of its use (for example, 2019 Jolo Cathedral bombings and Alameda, California#Attractions). Reported at Village Pump here and at MediaWiki Talk:Beta Features here.

TheDJ added a comment.EditedFeb 7 2019, 11:23 AM

OSM Location map, is using the graph extension and passes a json blob to it. This json blob has similar errors, including trailing commas, // comments etc..

I also found the following related issues:

*edit* I seem to have fixed this specific case by chasing trailing comma's and an incorrect value
Interestingly enough that seems to indicate that comments are allowed or ignored for some reason...

MSG17 added a comment.Feb 7 2019, 12:33 PM

@TheDJ Thank you for fixing the trailing commas. I did figure it was related to comma placement in some way, but I mostly focused my efforts on commas with spaces before them. I will look at reimplementation, especially since there is still an error produced by the highway systems template in PHP7.


(Plaintext: <mapframe>: Couldn't parse JSON: Control character error, possibly incorrectly encoded)

Krenair added a subscriber: Krenair.Feb 7 2019, 1:18 PM
Anomie renamed this task from PHP7 migration affecting Lua scripts or templates? to PHP7's stricter JSON parsing breaks some wiki content.Feb 7 2019, 2:28 PM
Anomie updated the task description. (Show Details)
Anomie added a subscriber: Anomie.

I've updated the title of the task and added a summary to reflect the underlying issue here.

TheDJ updated the task description. (Show Details)
TheDJ added a comment.EditedFeb 7 2019, 9:51 PM

Sort of brute force fixed this using:
mw.ustring( jsonstring, "%c", " ")

https://en.wikipedia.org/w/index.php?title=Module%3AMapframe&type=revision&diff=882266119&oldid=876305861

Since control characters are useless in both json and sparql i just strip them all and replace with a space char.. Possibly this was only affecting the raw variable... but too tired to figure that out right now ;)

Seems this fixes dynamic maps, but static pre-rendered map is still broken...

TheDJ added a comment.Feb 8 2019, 7:05 AM

pre-rendered map maybe falling into T158657: Kartotherian error: GroupId not available now...

MSG17 added a comment.EditedFeb 8 2019, 12:11 PM

Yeah, even in HHVM the thumbnail doesn't render correctly (unless you are in editing mode, because there the thumbnail is actually a smaller dynmap and not an image). Still, thanks for fixing the dynmap.

TheDJ added a comment.Feb 8 2019, 12:22 PM

Yeah, even in HHVM the thumbnail doesn't render correctly (unless you are in editing mode). Still, thanks for fixing the dynmap.

That groupId ticket notes: "In most cases they aren't caused by errors in the system, rather by invalid parameters and that can be handled better."

So possibly, there is still something wrong with the values of some of the keys.. Hard to figure out this way WHAT is wrong though...

MSG17 added a comment.EditedFeb 8 2019, 3:18 PM

Weirdly enough, the pre-rendered map now works in my sandbox on English Wikipedia, but nowhere else with the highway systems template. Maybe something to do with the Wikidata ID entry? My sandbox page doesn't have a Wikidata ID, but the other pages I looked at did.

After removal of that one trailing comma, I have done the sandbox trick and stuffed the wikidata id entry in where it should be (Q867944 in place of (if(?id = wd:blah...) and I get a nice little rendered map with two stroke colors and the whole thing. This was taking a copy of the OSM template and plugging some values directly into it (lat/long, frame width/height, zoom, plain=yes). So uh? I must be overlooking something obvious.

Ah note also that if you click through on the (empty) map in the wp article, you do see a properly rendered full screen map. Huh!

MSG17 added a comment.Feb 8 2019, 8:13 PM

The small empty map on the article is the issue - something is wrong with a parameter, so that small map doesn't get rendered while the full screen one is now properly rendered. That small map basically uses an image of what the map would normally display in order to cut down loading times and avoid accidentally moving the map window to focus elsewhere.

Joe added a subscriber: Joe.Feb 12 2019, 10:24 AM
MSG17 added a comment.EditedFeb 12 2019, 12:31 PM

OK, I fixed the thumbnail issue by removing some trailing spaces:

https://en.wikipedia.org/w/index.php?title=Template%3AHighway_system_OSM_map&type=revision&diff=882961558&oldid=882213517

It does take some time for the thumbnail to load, but it works.

Strangely enough, when I tried this change at zhwiki it actually broke the template in a similar matter. I will do some more testing in HHVM and PHP7 and see what happens.

Do we know which of those spaces in particular was the culprit, or if all of them were? Asking just out of curiosity.

MSG17 added a comment.EditedFeb 12 2019, 12:50 PM

Yeah, it looks like the spaces for assigning a Wikidata ID for the highway system property (P16) to show was the issue. It must have caused a syntax issue with the stricter parser.

I changed

{?id wdt:P16 wd:{{#if: {{{highway_system_qid|}}} | {{{highway_system_qid}}} | {{wikidata|property|raw|P16}} }}.}

to

{?id wdt:P16 wd:{{#if:{{{highway_system_qid|}}}|{{{highway_system_qid}}}|{{wikidata|property|raw|P16}}}}.}

(note the spaces in front of and behind each spot for the passed parameter and the Wikidata property template).
Also, I did some testing on enwiki and it seems that the template now works in both HHVM and PHP7. I'll test zhwiki again.

EDIT: Nevermind, its back to being broken on my desktop. How strange... Maybe it might be because I am running Firefox compared to my Chromebook which is running, well... However, the manual query works, so I think that I will still have a good chance at fixing this.

MSG17 added a comment.EditedFeb 13 2019, 12:16 PM

OK, so for some reason the way the highway system parameter is being called is causing issues. If the highway system parameter is specified, it will break, but if I don't supply a parameter and let the Wikidata template find it (it looks for the ID for property P16) it will work. Interesting...

EDIT: Found out that the issue was extra whitespace in the length parameter. I guess there will have to be a lot of spacing changes for many templates to compensate for PHP7, but maybe I will be pleasantly surprised.

Let me add content from << T217103: Extension:Kartographer producing "Couldn't parse JSON" errors when SPARQL includes queryhints >> since this seems a part of the issue raised here and may provide an additional failure-mode example:

There seems to be an issue, believed to be Extension:Kartographer producing "Couldn't parse JSON" errors when SPARQL includes queryhints, described at https://www.wikidata.org/w/index.php?title=Wikidata:Request_a_query&oldid=867570503#Optimizing_a_query_for_the_mapframe_extension

I'm afraid it's all a bit over my head. Grateful for some knowledgeable input.

TheDJ added a comment.Feb 26 2019, 2:16 PM

I'm afraid it's all a bit over my head. Grateful for some knowledgeable input.

Fixed: A trailing comma, and the usage of " inside a json value, prematurely ending the json value.

That's not a fix, so much as a helpful workaround.

If we take the not unreasonable view that Blazegraph should be allowed to specify what is valid syntax for hints on their own engine - see https://wiki.blazegraph.com/wiki/index.php/QueryHints which specifies the user of " - then the parser still has a fail.

Forcing users to learn, magically, from the ether, that non-standard syntax must be offered to the parser and that the query engine makers specification has been disregarded by the parser constructors, is woefully suboptimal.

@Tagishsimon: if you want to put a value containing " insides a JSON string, you need to escape it, as for example \" or \u0022. You'd also need to escape backslashes (\) and various control characters (although en:Module:Mapframe appears to blindly replace all control characters with spaces for you).

@TheDJ just took the more readable route of replacing the " with ', since blazegraph apparently accepts either type of quote.

TheDJ added a comment.EditedFeb 26 2019, 7:21 PM

Interestingly enough \" doesnt actually seem to work, not even when directly embedding the json blob inside a <mapframe> tag. Might be an issue in the kartographer extension.

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMar 24 2019, 2:20 PM

So I see two ways to move forward:

  • Write a bot to fix bad jsons that was allowed through hhvm until we fully deprecate hhvm.
  • Make mediawiki more strict on accepting json until the migration is over.

I know how to do 1, but I have no idea how we can do 2 while 2 seems a better approach. At the end, it's just a migration issue, no permanent solution is needed.

We should stick to multiline relaxed strings, while regarding superfluous comma and apostrophe delimiters as errors.

Imagine there is some wikitext stored as JSON string:

"Page content:\n==Headline==\n*first item\n*second item\nThe story to be told."

Looks better if formatted as

"Page content:
==Headline==
*first item
*second item
The story to be told"

If I imagine an entire paragraph or page of wikitext needs to be passed as one single line only I am not really happy. The same goes for some kind of formatted data lines within one JSON string value, like assigments key1=value1 each of them written on a single line, to be interpreted later. That should be kept as multiline human readable source format.

All JSON blob will be piped through either

  • rMW/includes/json/FormatJson.php
  • rMW/includes/content/JsonContent.php →FormatJson

and final encoding U+000A→\n etc. within strings may be done immediately before passing blob to PHP7::json_decode().

On prettyprint \n \t might be presented nicely.

For those who are dealing with JSON code within wiki pages as template parameter or within Lua modules I did prepare Module:JSONutil which will show more elaborated messages like these test cases.

It can be used to detect problematic blobs within a wiki right now under HHVM conditions.