Page MenuHomePhabricator

Augment jsonlint test with a test for duplicate keys
Closed, DeclinedPublic

Description

json files may not contain duplicate keys, yet Jenkns does not detect it.

https://gerrit.wikimedia.org/r/#/c/160541/9/languages/i18n/qqq.json


Version: unspecified
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:55 AM
bzimport set Reference to bz71284.
bzimport added a subscriber: Unknown Object (MLST).

Hmm. According to the config file, jsonlint is run as part of the jslint job, along with jshint, for mwcore…

http://git.wikimedia.org/blob/integration%2Fjenkins-job-builder-config.git/HEAD/mediawiki.yaml#L21

(In reply to Greg Grossmeier from comment #2)

I don't see output from any of those on
https://gerrit.wikimedia.org/r/#/c/160541/9

Yeah, good point.

Also, per https://github.com/zaach/jsonlint/issues/13 duplicate keys are valid (if useless) JSON and so aren't going to trigger jsonlint to error.

(In reply to James Forrester from comment #3)

(In reply to Greg Grossmeier from comment #2)

I don't see output from any of those on
https://gerrit.wikimedia.org/r/#/c/160541/9

Yeah, good point.

Found the cause; jslint and jsonlint are now done for mw-core in npm, so the failure of npm to due to a submodule checkout error.

Re-writing this bug to be a feature request.

The script is in integration/jenkins.git bin/json-lint.php It uses PHP json_decode() to parse the json.

And with PHP 5.4.24:

php > var_dump( json_decode( '{ "key": 1, "key": 2}' ) );
class stdClass#1 (1) {

public $key =>
int(2)

}
php >

The first key is overridden.

One can look at the JSON RFC http://tools.ietf.org/html/rfc7159#section-4 which states:

[..] When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable.  Many implementations report the last name/value pair
only.  Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.

Ideally we would propose to upstream PHP an option to json_decode() which would warn/fatal/except whenever a dupe is encountered.

Meanwhile one will have to figure out how to parse json "manually" and detect such error. Which is hmm cumbersome.

Patch welcome.

python 2.7.8 has the same behavior:

Repeated names within an object are accepted, and only the value of the
last name-value pair is used.

https://docs.python.org/2/library/json.html#standard-compliance

Tell you the truth, I would much rather have jenkins throw an error and reject it, and let user override it manually. Can't imagine any reason to allow dups IMHO, even if some langs are ok with it.

What I really meant is that neither PHP json_decode() or python json decoder let us detects duplicate key. So one needs to find an implementation that warns/dies or write one to do so.

(In reply to Antoine "hashar" Musso from comment #8)

What I really meant is that neither PHP json_decode() or python json decoder
let us detects duplicate key. So one needs to find an implementation that
warns/dies or write one to do so.

According to https://docs.python.org/2/library/json.html#repeated-names-within-an-object we can override that behavior. https://mail.python.org/pipermail/python-list/2013-May/647954.html has an example snippet on how to do so.

Setting lowest priority until someone is willing to take this and figure out a solution.

(In reply to Antoine "hashar" Musso (WMF) from comment #10)

Setting lowest priority until someone is willing to take this and figure out
a solution.

Ok, I wrote a python script that recursively searches for JSON files, and checks if they have any duplicate keys: https://github.com/legoktm/dupe-keys-json.

I don't know how to turn it into something run by jenkins though.

I like the checker since it has tox and tests :-D I am not too sure how to have it deployed on all the CI slaves though. Ideally we would add a setup.py and make it a Debian package, not sure whether it is worth the effort though.

Kunal, can you sync with Yurik to figure out whether that match is use case?

If anyone has an idea how to deploy the script on all the CI slaves I am taking inputs :-)

Seems like it does. Didn't think it required such an elaborate python code though! And thanks!!! :)

Which dependencies? Can't we add this to slave-scripts?

As for the actual implementation, how does it behave in case of syntax errors? The nodejs jsonlint implementation actually hooks into the parser and gives the user useful errors. This is the package we're already using in npm+grunt based test pipelines.

It doesn't report duplicate keys at the moment, but that could be implemented. There is an open issue for it: https://github.com/zaach/jsonlint/issues/13

https://www.npmjs.org/package/grunt-jsonlint
https://github.com/zaach/jsonlint

Moved the repo to https://github.com/legoktm/jsonchecker.

I like the checker since it has tox and tests :-D I am not too sure how to have it deployed on all the CI slaves though. Ideally we would add a setup.py and make it a Debian package, not sure whether it is worth the effort though.

Kunal, can you sync with Yurik to figure out whether that match is use case?

If anyone has an idea how to deploy the script on all the CI slaves I am taking inputs :-)

I've added a setup.py file in version 0.3 and published it on pypi (https://pypi.python.org/pypi/jsonchecker). It has no external dependencies and will run on Python 2.7+

As for the actual implementation, how does it behave in case of syntax errors? The nodejs jsonlint implementation actually hooks into the parser and gives the user useful errors. This is the package we're already using in npm/npm based test pipelines.

It displays the same message that the Python JSON library provides. For example, the error message for https://github.com/legoktm/jsonchecker/blob/aa6d7f4089739c25cea4896c762a88d573b9baac/tests/testdata/bad/invalid.json is "Expecting property name enclosed in double quotes: line 4 column 1 (char 62)".
I'm not sure how that compares to the nodejs one, in this case the error is that there is a trailing comma.

Change 192059 had a related patch set uploaded (by Legoktm):
Add jsonchecker.py

https://gerrit.wikimedia.org/r/192059

Patch-For-Review

Change 192060 had a related patch set uploaded (by Legoktm):
Use jsonchecker.py for jsonlint macro

https://gerrit.wikimedia.org/r/192060

Patch-For-Review

Krinkle claimed this task.

Closing this per our policy to no longer extend or deploy new build scripts.

Brief summary of previously expressed reasons:

  • Maintenance burden on CI instead of individual projects.
  • Can't be easily run locally by developers (manually sync the right version..).
  • Does not scale due to inability to do controlled upgrades (e.g. update the behaviour and update project's code in the same commit).

This is why JSHint is no longer upgraded and instead people use npm test jobs to run jshint from there with the correct version as specified in their packagejson. Same for PHPUnit freeze, and soon for phpcs as well.

You're free to package or embed it in any project as you see fit and add it to their npm-test or composer-test pipeline. If the desired project isn't using this yet, let's work on that.

Note that various projects currently use jsonlint and grunt-jsonlint.

It may be undesirable to use build scripts not written in PHP, JavaScript or bash. I suppose a composer or npm packages could ship a Python file and shell out to run it. Or, if the script is directly embedded in a project, it can be added to the scripts.test property and contain test: [ 'phpcs - ... ', 'phpunit', 'python ./test/jsonlist.py' ]).

Sorry, I disagree. jsonchecker is merely a syntax checker[1]. Each project should not need to configure their own JSON linter just to make sure they have proper syntax. All the examples you've provided (JSHint, PHPUnit, phpcs) do significantly more than basic syntax checking and are constantly being updated, while the JSON data format has not changed at all. It's highly unlikely that the jsonchecker project will ever add new functionality.

The desired project in this case is MediaWiki core and all skins + extensions. I do not plan on setting up composer/npm/tox/whatever for 600+ repositories, that's just not feasible.

[1] As noted earlier, the JSON RfC only says duplicate keys SHOULD not be used. However given that no major language implementation supports duplicate keys, it makes sense to consider them to be a syntax error.

Change 192060 abandoned by Krinkle:
Use jsonchecker.py for jsonlint macro

https://gerrit.wikimedia.org/r/192060

Change 192059 abandoned by Legoktm:
Add jsonchecker.py

https://gerrit.wikimedia.org/r/192059