Page MenuHomePhabricator

JsonContent is not safe to extend
Open, LowPublic

Description

I started looking into custom extension content models, and I found that JsonContent core class is not safe to extend. Is that intended? Am I supposed to extend TextContent and re-implement the pure JSON part? Or is this just an oversight?

Event Timeline

Change 667363 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/core@master] Mark JsonContent as safe to extend

https://gerrit.wikimedia.org/r/667363

The advice I got from Platform was to not use JsonContent ever, as it had a lot of old-style architectural decisions that they didn't want continue. We're in the middle of switching WikiLambda to its own direct implementation of AbstractContent.

I am in the same situation with https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/638528. If JsonContent (+Handler?) is to be avoided, I would like to know more details what I should avoid in my code in addition to not extending JsonContent.

Copying my comment from the patch, for reference:

JsonContent is somewhat ill conceived / misleading. I think it should have a big fat warning at the top telling people that they probably do NOT want to extend it. The thing is:

JsonContent is a subclass of TextContent, which means "user editable text". So it should be used ONLY if the idea is that users will manually edit JSON, directly in a text box. Which is generally a terrible idea.

If that is not the intent, then JsonContent should not be used. It should not be misunderstood as a base class for things that get serialized as JSON.

Personally, I think we should deprecate JsonContent, rather than encourage extensions to build on it.

Change 673500 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Extract JsonStructureRenderer from JsonContent

https://gerrit.wikimedia.org/r/673500

I made a patch that marks JsonContent and JsonContentHandler as stable to extend. They are useful for things like config files (though I'd like to see YAML for that).

I pulled out the JSON structure rendering, and add a bug notice explaining when JsonContent should and should not be used. After pulling out the rendering, JsonContent is barely a hundred lines, so building on top of it will hopefully be less tempting.

JsonContent is somewhat ill conceived / misleading. I think it should have a big fat warning at the top telling people that they probably do NOT want to extend it. The thing is:

JsonContent is a subclass of TextContent, which means "user editable text". So it should be used ONLY if the idea is that users will manually edit JSON, directly in a text box. Which is generally a terrible idea.

Well, no. It's possible to add a nice editing interface in front (MassMessage does this rather well IMO) but still let advanced users who want to edit it as raw JSON. Yes, exposing users to only JSON in a text box is a terrible idea, but that has nothing to do with the content handler representation.

I put JsonContent in core because there were a ton of extensions copying it around, including: EventLogging, UploadWizard, BookManagerv2, JsonConfig, MassMessage, none of which I would describe as "ill conceived / misleading".

If that is not the intent, then JsonContent should not be used. It should not be misunderstood as a base class for things that get serialized as JSON.

Personally, I think we should deprecate JsonContent, rather than encourage extensions to build on it.

JsonConfig provides a way to store JSON in wiki pages, with validation, PST, and a default structured representation. It also enables CodeEditor's JSON editor. It does nothing more and never promised to do so.

Well, no. It's possible to add a nice editing interface in front (MassMessage does this rather well IMO) but still let advanced users who want to edit it as raw JSON. Yes, exposing users to only JSON in a text box is a terrible idea, but that has nothing to do with the content handler representation.

Ok, I'll take back "ill conceived" - when the intent is to maintain complex configuration or similar structures that benefit from the flexibility of JSON, and you want to expose that complexity to users, JsonContent is useful. And of course, it's easier to just let users edit JSON than to write a dedicated API and UI for editing.

My point is: JsonContent should only be used if the intent is to allow users to edit JSON directly (even if another way to edit). JsonContent is a TextContent, and TextContent is, by definition, user editable text. Anything that is not user editable text should not be TextContent. There are several places in MediaWiki that do typechecks for TextContent to see if the content should be treated as text, or as serialized data.

I think JsonContent is misleading, because people who want to maintain "structured data" on a page may think building on top of JsonContent is the obvious and correct thing to do, if they want to serialize to JSON. This leads to code that is based on untyped nested arrays rather than a proper domain model, and additional work do prevent direct editing and replace the default rendering.

Would you agree that for structured data that should not be editable directly and needs custom rendering, JsonContent should not be used? If yes, how would you document that on the class? If no, how do you think such a user cases benefits from JsonContent?

JsonConfig provides a way to store JSON in wiki pages, with validation, PST, and a default structured representation. It also enables CodeEditor's JSON editor. It does nothing more and never promised to do so.

Yes - it's the right thing to use if you want users to edit JSON! But I think it's the wrong thing to use if you only want to serialize to JSON, but do not want users to edit JSON. That's what I want to make clear in the class documentation.

My point is: JsonContent should only be used if the intent is to allow users to edit JSON directly (even if another way to edit).

Yes, but I'm not sure why you're against allowing this as a possibility. I think in the vast majority of cases there's nothing wrong with allowing such arbitrary editing.

JsonContent is a TextContent, and TextContent is, by definition, user editable text. Anything that is not user editable text should not be TextContent. There are several places in MediaWiki that do typechecks for TextContent to see if the content should be treated as text, or as serialized data.

Surely direct editing should be controlled by... ContentHandler::supportsDirectEditing() and not instanceof TextContent? Wasn't this the whole point of introducing that function?

In any case, in my experience extending TextContent means that a lot of things just work and you just override what you need. Extending AbstractContent is painful and you have to reimplement a bunch of stuff (it's been years since I tried fwiw).

I think JsonContent is misleading, because people who want to maintain "structured data" on a page may think building on top of JsonContent is the obvious and correct thing to do, if they want to serialize to JSON. This leads to code that is based on untyped nested arrays rather than a proper domain model, and additional work do prevent direct editing and replace the default rendering.

Would you agree that for structured data that should not be editable directly and needs custom rendering, JsonContent should not be used? If yes, how would you document that on the class? If no, how do you think such a user cases benefits from JsonContent?

What exactly is your proposed alternative? Are all the extensions currently utilizing JsonContent doing something wrong that needs calling out in the documentation? JsonContent is not the answer if you're trying to implement Wikibase, but for just about every other structured wiki page thing, it seems to work pretty well.

Yes, but I'm not sure why you're against allowing this as a possibility. I think in the vast majority of cases there's nothing wrong with allowing such arbitrary editing.

If what you are storing is basically configuration, then free-form editing (plus some validation) is fine. If it's actually data, then not so much.

There are basically two kinds of cases: "store JSON on a wiki page" and "store data on a wiki page". For the latter, the data can be encoded as JSON, sure, but it should not be using JsonContent.

Surely direct editing should be controlled by... ContentHandler::supportsDirectEditing() and not instanceof TextContent? Wasn't this the whole point of introducing that function?

Yes, you can control it via ContentHandler::supportsDirectEditing(). But if you don't want direct editing, why do you extend TextContent? What do you gain from it?

If you take out the code for rendering the JSON structure, all that is left in JsonContent is code for prettifying the JSON itself. Which you don't need if you don't want to edit it. JsonContentHandler is even more trivial.

In any case, in my experience extending TextContent means that a lot of things just work and you just override what you need. Extending AbstractContent is painful and you have to reimplement a bunch of stuff (it's been years since I tried fwiw).

AbstractContent has no abstract methods, you just add your desired data to it. ContentHandler has three abstract methods: serializeContent(), unserializeContent(), and makeEmptyContent(). All of them are trivial.

If you build on top of TextContent, you get these things "for free":

  1. Diffs, line based. This can work OK if the JSON is nicely formatted, and is fairly human readable. For complex data structures, you probably want a custom diff representation, like Wikibase does. Line based diffs are going to be useless.
  2. Template-transclusion (as text). You probably don't want that for data structures.
  3. Line-based 3-way merged of edit conflicts. Which for a JSON data structure likely to explode, you are better off without. If you want conflict resolution, you'll have to build one that understands the semantics of the data and doesn't operate on text.
  4. Full text search, based on the JSON data. Which will includes field names, which is probably not what you'd want.
  5. Conversion to and from other text-based content formats. You can re-interpret your data structure as wikitext. Probably not desirable for complex data structures.

This is why I think that JsonContent makes it easy to do the wrong thing for extensions that want to store "data".

What exactly is your proposed alternative? Are all the extensions currently utilizing JsonContent doing something wrong that needs calling out in the documentation? JsonContent is not the answer if you're trying to implement Wikibase, but for just about every other structured wiki page thing, it seems to work pretty well.

It's perfectly fine for something like EntitySchema or EventLogging. It's dubious for something like JADE, UploadWizard or MediaUploader (make users edit JSON because building a UI is too much work). It's the wrong thing for something like WikiLambda.

Perhaps there should be a DataContent sub-class of AbstractContent which we encourage things to use, and provide a JsonDataContent type without the direct-edit features?

Perhaps there should be a DataContent sub-class of AbstractContent which we encourage things to use, and provide a JsonDataContent type without the direct-edit features?

We can have a JsonDataContentHandler and JsonDataContent make sense to me if they build on top of JsonUnserializable data objects. They would be trivial, but would provide guidance.

By the way - it should be possible to switch a given content model's handler from JsonContent to JsonDataContent (or vice versa) without breaking existing content in the database, as long as the serialiozation is compatible and the model name is the same.

What I don't want to encourage is nested arrays as the internal representation of structured data.

Change 674572 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] DEMO: JsonDataContentHandler

https://gerrit.wikimedia.org/r/674572