Page MenuHomePhabricator

Decide on JSON validation library
Open, MediumPublic

Description

As we increasingly store content in JSON, usually via ContentHandler, there is a need for a standardized JSON validation system that meets the requirements of Wikimedia projects.

We currently have two three different solutions deployed to the Wikimedia cluster:

  • the JsonSchema class bundled in the EventLogging extension (e.g. used for wiki pages in the Schema namespace on meta.wikimedia.org.)
  • the json-schema library included in MediaWiki core. (e.g. used for validating extension.json for extension management).
  • the JsonConfig extension's custom validation (e.g. use for wiki pages in the Data namespace on commons.wikimedia.org).
  • (more if you count non-php services, like for example Eventgate which uses ajv: https://github.com/wikimedia/eventgate/blob/master/package.json#L36)

None of these are suitable for our needs. We should determine our requirements for a validator and then choose a validator to implement in core.

Full RFC here: https://www.mediawiki.org/wiki/Requests_for_comment/JSON_validation

Event Timeline

Do I understand it correctly, that:

for a validator and then choose a validator to implement in core.

menas, that we should implement a new validator in core? If this is correct, wouldn't it make more sense, to select one of the currently deployed validators and implement our needs into this library, instead of creating yet another validator? :)

I actually think it was a mistake for us to support JSON in wiki markup because it is harder to read, much harder to write, and impossible to comment -- all of which are requirements for human-editable content. To rectify this, I just proposed T147158 - a fairly painless switch to YAML for the same content. We can still use JSON internally, e.g. to store the data in the page properties or to make it available to JavaScript code.

Do I understand it correctly, that:

for a validator and then choose a validator to implement in core.

menas, that we should implement a new validator in core? If this is correct, wouldn't it make more sense, to select one of the currently deployed validators and implement our needs into this library, instead of creating yet another validator? :)

I meant that we should somehow find a way to get a validator that meets our needs into core, not necessarily that we should write a new one. Technically that is an option but for the record I think we should make use of an existing one.

Ah, ok, thanks for clarification @Harej :)

...I just proposed T147158 - a fairly painless switch to YAML for the same content...

Regardless of the backend storage and editing implementation, we still need a validation system.

I second the idea of using YAML instead. It's just easier to read and write for humans. The added bonus to this context is that JSON is valid YAML, and can thus be validated using the same principles (but we'd need to use a YAML validator, though).

We'd still have to validate that YAML in the same way we need to validate JSON (luckily, the formats support the same data structures, so after parsing them the validation would be the same). Let's discuss YAML on the other task though :)

Re JSON schema validation: We are using JSON schema draft v4 in Swagger APIs & EventBus, so it would be desirable if the future PHP library for schema validation supported v4 as well. I am not aware of any uses of remote schemas (and there are security concerns), so we should be fine if the library did not support those for now.

The question of syntax seems fairly orthogonal to JSON Schema validation. JSON schema validators can validate data structures serialized as YAML or TOML as well.

I wanted to make sure to note that several of us have been discussing this on wiki as well: https://www.mediawiki.org/wiki/Talk:Requests_for_comment/JSON_validation

The topic threads:

Just to be clear, this discussion is not about JSON vs YAML. JSON-schema is a schema standard and does not depend on it being written specifically in JSON. For example, the following two segments are both JSON-schema compliant:

JSON
{
  "type": "object",
  "properties": [
    {
      "name": "id",
      "description": "the user's ID",
      "type": "integer",
      "minimum": 1
    }, {
      "name": "username",
      "description": "the user's name in the system",
      "type": "string"
    }
  ]
}
YAML
type: object
properties:
  - name: id
    description: the user's ID
    type: integer
    minimum: 1
  - name: username
    description: the user's name in the system
    type: string

As a side note, it took me twice as much time to write the JSON fragment as it did for YAML (no copy/pasting was involved on purpose).

JSON
{
  "type": "object",
  "properties": [
    }
      "name": "id",
      "description": "the user's ID",
      "type": "integer",
      "minimum": 1
    }, {
      "name": "username",
      "description": "the user's name in the system",
      "type": "string"
    }
  ]
}

As a side note, it took me twice as much time to write the JSON fragment as it did for YAML (no copy/pasting was involved on purpose).

And, the JSOn is invalid... line 3 should be:

{

instead of

}

On the other hand I would say, that I would like JSON more, but that could be related with the fact that I JSON more than YAML :)

And, the JSOn is invalid... line 3 should be:

... which just proves my point about manually JSONing.

On the other hand I would say, that I would like JSON more, but that could be related with the fact that I JSON more than YAML :)

Much easier to get accustomed to YAML, though, then vice-versa.

I was just looking at the FileAnnotations extension, which also depends on EventLogging solely for json schema validation.

It seems very odd for so many extensions to depend on EventLogging for something that really has very little to do with event logging.

FWIW, this rfc is about validating JSON against a schema, not parsing it. Parsing and validating are rather separate problems.

I volunteered to add i18n support to the Justin Rainbow validation library https://github.com/justinrainbow/json-schema/issues/363

The json-schema team has centralized the error messages as of the 6.0.0 branch and possibly the 5.2.0 branch as well. This means that all exceptions in json-schema will now return consistent error codes, instead of strings.

They have not opted to implement their own i18n interface. They (understandably) want to avoid the maintenance overhead of having to deal with error message semantics in many different languages. For our purposes, however, we can take the error messages and develop our own error code library, using TranslateWiki for translation. Then it's just a matter of feeding json-schema's outputs into our i18n library and we can use this in user-facing products.

@Harej What's the status of this? From looking at this task, it looks like it kind of lost steam toward the end of last year?

@Catrope, on our end, we need to upgrade our json-schema dependency to use the 6.0.0 branch. From there, we need to write an i18n library to use on top of json-schema. Notably this is currently being held back by no one having time to do it.

Moving back to backlog due to inactivity.

Note that we are probably going to revisit some aspects of this soon for T224375: REST API Developer declares JSON validation parameters.

Krinkle updated the task description. (Show Details)
daniel renamed this task from RFC: JSON validation to Decide on JSON validation library.Jul 22 2019, 5:01 PM
daniel removed a project: TechCom-RFC.
daniel subscribed.

Removed this from the RFC board. This doesn't seem hard to undo, nor strategic, not cross-cutting, so no need for an RFC.

Milimetric subscribed.

The server-side EventLogging validation will be deprecated in favor of ingestion through EventGate, which has its own validator. Putting this on Radar for us, but reach out if we want to possibly coordinate validators further.

Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:33 AM
Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:36 AM