Page MenuHomePhabricator

Add JSON parameter type to the action API
Open, Needs TriagePublic

Description

ReadingLists expects JSON strings for some parameters (for batch write operations) but the handling is implemented purely inside the API module so it does not integrate very well. JSON input will probably become more important in the future, given our increased focus on structured data (or maybe the usecase will be obsoleted by having a REST API in MediaWiki... in any case, worth considering at least) so it would be nice to have framework-level integration for JSON parameters. That would include:

  • core support for some means of describing JSON structures (JSON Schema being the obvious choice)
  • a JSON parameter type in the API, with the ability to prescribe a schema
  • JSON-aware input normalization (e.g. dealing with input strings that contain JSON-encoded non-NFC Unicode sequences)
  • schema validation as part of the normal parameter validation of the API
  • exposing the schema in the paraminfo API
  • maybe exposing the schema to some extent in the help API?
  • appropriate controls in ApiSandbox (a text area with JSON validation? JSON schema validation?)

Related: T147137: Decide on JSON validation library

Event Timeline

I note we discussed some of this previously in T182475.

While there are some cases where a JSON blob as the value of a parameter makes sense, I worry that it'll more often be misused to try to avoid making a properly-defined API module. Requiring a schema might help with that somewhat.

given our increased focus on structured data

Your use case here isn't "structured data" in the sense that's meant when talking about "our increased focus on structured data".

When talking about "our increased focus on structured data", the API is unlikely to care whether the value of a content parameter is wikitext, structured data as JSON, structured data as XML, or something else. And operations on structured data at a sub-content level are more likely to be done well by specialized modules than by some generic API-level code.

or maybe the usecase will be obsoleted by having a REST API in MediaWiki

"REST" isn't some magical dust that makes everything better, as much as some people seem to think so. It has a marginal benefit in that it typically uses pathinfo as positional parameters so clients don't have to think about ordering query parameters for cacheability. And it's the new shiny.

JSON-aware input normalization (e.g. dealing with input strings that contain JSON-encoded non-NFC Unicode sequences)

If you're meaning param={"foo":"a%CC%81"}, that's unlikely to be allowed since MediaWiki's input layer normalizes all text strings. If you mean param={"foo":"a\u0301"}, that seems outside the scope of a generic API layer.

maybe exposing the schema to some extent in the help API?

A link to some endpoint that serves it would be appropriate, IMO.

And operations on structured data at a sub-content level are more likely to be done well by specialized modules than by some generic API-level code.

Sure. Each specialized module shouldn't have to reinvent the basics of JSON parameter handling, though.

or maybe the usecase will be obsoleted by having a REST API in MediaWiki

"REST" isn't some magical dust that makes everything better, as much as some people seem to think so. It has a marginal benefit in that it typically uses pathinfo as positional parameters so clients don't have to think about ordering query parameters for cacheability. And it's the new shiny.

It's the old shiny; people seem to be moving to GraphQL these days :) Anyway, the point is that the REST API will almost certainly be JSON-only (that has nothing to do with it being REST; JSON has simply outcompeted the alternatives and became the de facto standard, and as a new API framework it won't be burdened with backwards compatibility) so it will have to handle all the things described here. And if the REST API supports JSON-based use cases, maybe there is no point in adding them to the old API as well. (I don't think that's a very good argument - I think the efforts to handle JSON in one or the another API largely overlap - but thought it was worth to spell it out.)

If you mean param={"foo":"a\u0301"}, that seems outside the scope of a generic API layer.

I mean that, yeah. It seems weird to have an API framework that normalizes some input paramters but not others. If we are confident enough about the inappropriateness of non-normalform strings that we have disallowed them so far in the API (and there seems little reason not to be), allowing it in new parameter formats will only create confusion and unexpected errors.

A link to some endpoint that serves it would be appropriate, IMO.

It would be a start but still a far cry from human-readable documentation. I'm sure there are libraries which can present a JSON schema in some easy-to-understand way. (docson looks like an option for example.)

And operations on structured data at a sub-content level are more likely to be done well by specialized modules than by some generic API-level code.

Sure. Each specialized module shouldn't have to reinvent the basics of JSON parameter handling, though.

Ideally the specialized module won't need JSON parameter handling at all.

If you mean param={"foo":"a\u0301"}, that seems outside the scope of a generic API layer.

I mean that, yeah. It seems weird to have an API framework that normalizes some input paramters but not others. If we are confident enough about the inappropriateness of non-normalform strings that we have disallowed them so far in the API (and there seems little reason not to be), allowing it in new parameter formats will only create confusion and unexpected errors.

The level at which the normalization is done doesn't know anything about JSON string escapes, so not so weird. And the API generically can't know whether the strings in your JSON data are actually UTF-8 text or something else. Ideally, IMO, your schema language would specify that a field is UTF-8 text versus "binary", and either reject or normalize at that level.

A link to some endpoint that serves it would be appropriate, IMO.

It would be a start but still a far cry from human-readable documentation. I'm sure there are libraries which can present a JSON schema in some easy-to-understand way. (docson looks like an option for example.)

True.