Page MenuHomePhabricator

Implement JSON schema validation in ArrayDef
Closed, ResolvedPublic

Description

ArrayDef is a TypeDef used with ParamValidator to represent a parameter that contains an array value.

ArrayDef should optionally take a JSON spec to validate against. Validation should be optional, controlled by an entry in the $options array passed to the validate() method.

The spec should be run through JsonSchemaTrait::normalizeJsonSchema, so it supports some shorthands that are convenient for use of JSON schemas in PHP, such as "list" and "map" types.

The actual validation should be implemented in a way similar to the code in ConfigSchemaAggregator::validateValue. We should investigate how much of that code we may be able to move into JsonSchemaTrait so it can be shared between ArrayDef and ConfigSchemaAggregator.

Related Objects

Event Timeline

Reedy renamed this task from IMplement JSON schema validation in ArrayDef to Implement JSON schema validation in ArrayDef.Apr 9 2024, 8:12 AM
Reedy updated the task description. (Show Details)

For The spec should be run through JsonSChemaTrait::normalizeJsonSchema we have Calling static trait method MediaWiki\Settings\Source\JsonSchemaTrait::normalizeJsonSchema is deprecated, it should only be called on a class using the trait

For The spec should be run through JsonSChemaTrait::normalizeJsonSchema we have Calling static trait method MediaWiki\Settings\Source\JsonSchemaTrait::normalizeJsonSchema is deprecated, it should only be called on a class using the trait

Yes, ArrayDef will have to use the trait, then call self::normalizeJsonSchema.

Change #1030554 had a related patch set uploaded (by Atieno; author: Atieno):

[mediawiki/core@master] arrayDef: Implement JSON schema validation in ArrayDef

https://gerrit.wikimedia.org/r/1030554

Note that "JSON Schema" is not a single well-defined thing, there are several slightly incompatible versions of it (Draft-04, Draft-06, Draft-07, Draft 2019-09, Draft 2020-12 - with, at least in the PHP world, tooling being somewhat sluggish to catch up with changes, so when people talk about "JSON Schema" it's not at all obvious that they mean the latest version of the spec). Either that should be configurable or (more realistically) we should choose what schema version MediaWiki uses, and document it clearly.

Either that should be configurable or (more realistically) we should choose what schema version MediaWiki uses, and document it clearly.

In practice, this should use the validator that we are already using for configuration in MW core, which is https://github.com/justinrainbow/json-schema version 5.2.13. According to that page, it supports schemas of Draft-3 or Draft-4.

daniel changed the task status from Open to In Progress.May 16 2024, 2:59 PM

Change #1038919 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] DNM: Throwaway patch to try out ArrayDef schemas in REST handlers

https://gerrit.wikimedia.org/r/1038919

After trying out patch set 19 of this gerrit change locally, I'm not sure this is doing what we want. At least, it is not doing what I expected. I thought (maybe incorrectly) that by defining a parameter of type "array" with a schema, I would be saying that the endpoint accepts an array, and each element of that array must conform to the schema. That's not what is happening. Instead, the parameter itself must conform to the schema.

To confirm this behavior, I wrote this WIP/DNM patch to try out the various combinations in a structured way. This patch includes two throwaway handlers, ArrayDefTestArrayHandler and ArrayDefTestObjectHandler. Each handler defines a single required body parameter of type array named foo. Each handler also specifies a schema requiring one field named bar. The first handler specifies that parameter foo is to be validated as an array, the second that it should be validated as an object. Both handlers specify method POST, but also that they need neither read nor write access (so that we don't have to mess with tokens for this exercise). Both handlers (if validation succeeds) just echo back the validated body as the response.

To save you looking at that code, here are the relevant parts of the handlers. Maybe I'm misunderstanding how I'm supposed to write schemas for these things? But the handlers accepted and executed this, and it passed the structure tests. So if I'm doing it wrong maybe we should consider protecting against people like me misunderstanding how to use it.

from ArrayDefTestArrayHandler:

	public function getBodyParamSettings(): array {
		return [
			'foo' => [
				self::PARAM_SOURCE => 'body',
				ParamValidator::PARAM_TYPE => 'array',
				ParamValidator::PARAM_REQUIRED => true,
				ArrayDef::PARAM_SCHEMA => [
					'type' => [ 'array' ],
					'required' => [ 'bar' ]
				]
			]
		];
	}

And just the actual param definition from ArrayDefTestObjectHandler (the only difference is the type line in the schema definition):

			'foo' => [
				self::PARAM_SOURCE => 'body',
				ParamValidator::PARAM_TYPE => 'array',
				ParamValidator::PARAM_REQUIRED => true,
				ArrayDef::PARAM_SCHEMA => [
					'type' => [ 'object' ],
					'required' => [ 'bar' ]
				]
			]

Here are the results (sorry, this is long):

First, calls to the handler that defines the foo parameter as an array.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/array

Body:

{}

Response:

{
    "error": "parameter-validation-failed",
    "name": "foo",
    "value": null,
    "failureCode": "missingparam",
    "failureData": null,
    "errorKey": "rest-body-validation-error",
    "messageTranslations": {
        "en": "Invalid request body: The \"foo\" parameter must be set."
    },
    "httpCode": 400,
    "httpReason": "Bad Request"
}

My comment: that's reasonable, no problems there.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/array

Body:

{ "foo": "bar" }

Response:

{
    "error": "parameter-validation-failed",
    "name": "foo",
    "value": "bar",
    "failureCode": "notarray",
    "failureData": null,
    "errorKey": "rest-body-validation-error",
    "messageTranslations": {
        "en": "Invalid request body: Invalid value for parameter <var>foo</var>: array expected."
    },
    "httpCode": 400,
    "httpReason": "Bad Request"

My comment: that's reasonable, no problems there.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/array

Body:

{ "foo": [ "bar" ] }

Response:

{
    "foo": [
        "bar"
    ]
}

My comment: This was successful. Should it have been? "foo" is an array containing the value "bar". Is the schema intended to specify what the contents of the array are?

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/array

Body:

{ "foo": [ "bar", "baz" ] }

Response:

{
    "foo": [
        "bar",
        "baz"
    ]
}

My comment: This was successful too. Should it have been? Now "foo" is an array containing values "bar" and "baz". This doesn't feel to me like we're specifying a schema.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/array

Body:

{ "foo": { "bar":  "baz" } }

Response:

{
    "error": "parameter-validation-failed",
    "name": "foo",
    "value": "{\"bar\":\"baz\"}",
    "failureCode": "schema-validation-failed",
    "failureData": {
        "schema-validation-error": {
            "property": "",
            "pointer": "",
            "message": "Array value found, but an array is required",
            "constraint": "type",
            "context": 1
        }
    },
    "errorKey": "rest-body-validation-error",
    "messageTranslations": {
        "en": "Invalid request body: ⧼schema-validation-failed⧽"
    },
    "httpCode": 400,
    "httpReason": "Bad Request"
}

My comment: This is where I decided to write this comment. "Array value found, but an array is required"? As a caller, what do I do with that? The actual problem, if I understand it, is that we passed in a json object, which was converted by json_decode to an array of incorrect structure. But the message is unhelpful.

Next, calls to the handler that defines the foo parameter as an object.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/object

Body:

{}

Response:

{
    "error": "parameter-validation-failed",
    "name": "foo",
    "value": null,
    "failureCode": "missingparam",
    "failureData": null,
    "errorKey": "rest-body-validation-error",
    "messageTranslations": {
        "en": "Invalid request body: The \"foo\" parameter must be set."
    },
    "httpCode": 400,
    "httpReason": "Bad Request"
}

My comment: that's reasonable, no problems there.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/object

Body:

{ "foo": "bar" }

Response:

{
    "error": "parameter-validation-failed",
    "name": "foo",
    "value": "bar",
    "failureCode": "notarray",
    "failureData": null,
    "errorKey": "rest-body-validation-error",
    "messageTranslations": {
        "en": "Invalid request body: Invalid value for parameter <var>foo</var>: array expected."
    },
    "httpCode": 400,
    "httpReason": "Bad Request"
}

My comment: glad that failed. But I find that error message a bit confusing given the next call.

POST http://default.mediawiki.mwdd.localhost:8080/w/rest.php/coredev/v0/arraydeftest/object

Body:

{ "foo": { "bar": "baz" } }

Response:

{
    "foo": {
        "bar": "baz"
    }
}

My comment: this succeeded. But foo is an object, not an array. If this is what we intended, then how does it make sense that an ArrayDef succeeds when no arrays were involved? Yes, at an implementation level, json_decode transforms the object from the json body into an array, causing it to pass the is_array test in ArrayDef. That doesn't feel right to me.

After trying out patch set 19 of this gerrit change locally, I'm not sure this is doing what we want. At least, it is not doing what I expected. I thought (maybe incorrectly) that by defining a parameter of type "array" with a schema, I would be saying that the endpoint accepts an array, and each element of that array must conform to the schema. That's not what is happening. Instead, the parameter itself must conform to the schema.

That's indeed the expected behavior. Its a consequence of re-using ParameterValidator for validating properties: we assume that the body is always a json object with properties, and we provide one "param spec" for validating each property (each "parameter").

My comment: This is where I decided to write this comment. "Array value found, but an array is required"? As a caller, what do I do with that? The actual problem, if I understand it, is that we passed in a json object, which was converted by json_decode to an array of incorrect structure. But the message is unhelpful.

That error message is terrible - it's a known upstream issue: https://github.com/jsonrainbow/json-schema/issues/523

My comment: this succeeded. But foo is an object, not an array. If this is what we intended, then how does it make sense that an ArrayDef succeeds when no arrays were involved? Yes, at an implementation level, json_decode transforms the object from the json body into an array, causing it to pass the is_array test in ArrayDef. That doesn't feel right to me.

Part of the confusion is that we are mixing the notions of "array" used by JSON and PHP. You define:

return [
			'foo' => [
				self::PARAM_SOURCE => 'body',
				ParamValidator::PARAM_TYPE => 'array',
				ParamValidator::PARAM_REQUIRED => true,
				ArrayDef::PARAM_SCHEMA => [
					'type' => [ 'object' ],
					'required' => [ 'bar' ]
				]
			]
	]

This says:

  • the body is a JSON object (represented as an associative array)
  • it must have a property "foo" (referred to as a "parameter).
  • the value of this field must be a JSON object (represented as an associative array)
  • That object must have a property "bar".

Basically, the array returned by getBodyParamSpec is a schema for a JSON object, but using different terminology. ParamValidator::PARAM_TYPE => 'array' requires a PHP array, which may be a JSON object or array - specifying 'type' => [ 'object' ] in the schema requires it to be an "object" (that is, an associative PHP array).

This behavior is consistent with how we handle parameters. It may be counter-intuitive from the perspective of OpenAPI specs, where we have a single schema for the body.

At the moment, there is no way to define a schema for the entiry body. I can think of ways to do that, but they are all a bit hacky. For instance, we could allow getBodyParamSettings to return different kind of things: a list of param settings, a single param setting, or a json schema. Detecting which is which is a bitm essy, but possible.

In getBodyParamSettings, we'd allow something like:

return [
		ParamValidator::PARAM_TYPE => 'array',
		ArrayDef::PARAM_SCHEMA => [
			'type' => [ 'object' ],
			'required' => [ 'foo' ],
			'parameters' => [
				'foo' => [
					'type' => [ 'object' ],
					'required' => [ 'bar' ]
				]
			]
		]
	]

Or perhaps better just:

return [
	'type' => [ 'object' ],
	'required' => [ 'foo' ],
	'parameters' => [
		'foo' => [
			'type' => [ 'object' ],
			'required' => [ 'bar' ]
		]
	]

Making this work with he way ParamValidator is currently implemented is going to be fiddly, though.

Re { "foo": [ "bar", "baz" ] }: that should fail. If it doesn't, something is clearly wrong.

Re { "foo": { "bar": "baz" } } failing - you say further down that { "foo": { "bar": "baz" } } succeeds, I'm confused now. The only difference is a space?...

Re { "foo": [ "bar", "baz" ] }: that should fail. If it doesn't, something is clearly wrong.

Re { "foo": { "bar": "baz" } } failing - you say further down that { "foo": { "bar": "baz" } } succeeds, I'm confused now. The only difference is a space?...

In the failing case, the schema is of type "array", in the successful case it is of type "object".

Chatted synchronously with @daniel . Still working through all the small details, but it turns out that the thing that I initially wanted to try this out on (parameter validation for batch creation of Reading Lists) is possible via this definition:

				'batch' => [
					self::PARAM_SOURCE => 'body',
					ParamValidator::PARAM_TYPE => 'array',
					ParamValidator::PARAM_REQUIRED => true,
					ArrayDef::PARAM_SCHEMA => [
						'type' => [ 'array' ],
						'items' => [
							'type' => 'object',
							'required' => [ 'name', 'description' ],
							'properties' => [
								'name' => [ 'type' => 'string' ],
								'description' => [ 'type' => 'string' ],
							],
						],
					]
				],

That's a happy thing, if maybe non-obvious how to write for someone new to all this.

Change #1030554 merged by jenkins-bot:

[mediawiki/core@master] arrayDef: Implement JSON schema validation in ArrayDef

https://gerrit.wikimedia.org/r/1030554

Atieno moved this task from In Progress to Done on the MW-Interfaces-Team board.