Page MenuHomePhabricator

Decide how to model different types of ZObject inside MediaWiki
Closed, ResolvedPublic

Description

We're going to have several different kinds of "ZObject"s in Wikilambda:

  • Z1, the conceptual top-level object
  • Z8, the function object (defining the arguments, return type, and test suites)
  • Z20, the function test suite object (defining the …)
  • Z14, the function implementation (defining what it implements, and a code block with language type)

The original model is that all of these (and more, like inter-object references, lists/arrays, keys, and even function calls) are all implemented in typed, tokenised JSON. They would be stored inside MediaWiki directly as JSON blobs, treating MW as a dumb file/content store. Each object would have a 'type' key/value pair. New types can be added trivially (potentially on-wiki).

This is hugely flexible for future expansion of the ZObject model and how it works, but there are consequent costs from this design which echo those we ran into with Wikibase, so I think it's worth thinking about.

Option 1: ZObjects are undifferentiated JSON blobs

  • Each ZObject takes a MW "page".
  • There is one namespace for ZObjects with one content handler, sub-classed from JSON.
  • Each blob has a 'type' key/value pair.
  • On reading the blob, the ZObject handler inspects the 'type' value and hands off the object to the code handler (renderer, execution, whatever) as needed.
Pros
  • Very flexible to manage.
  • Very flexible to use; a 'function' type can be re-tasked to be an 'implementation' or a 'tester' or whatever.
  • Very simple PHP code (at least, at first).
Cons
  • No DB querying is possible, now or ever. We can't in a DB query show a list of implementations, or ask what test suites have been updated in the last week, or highlight documentation changes, or ….
  • Hard to provide standard tools like re-using the CodeEditor for functions (or JSON blobs, for that matter), so end-user experience is going to be harder.
  • Might be /too/ flexible; when would we want users to change the type of something? Would we need a whole set of rules/workflows for such changes?

Option 2: ZObjects are differentiated at the MW level blobs

  • Each ZObject takes a MW "page".
  • There is one namespace for each kind of ZObject with a bespoke (slim) content handler each, sub-classed from an abstract one for ZObjects; some are JSON-like, but others are wikitext (documentation), code (implementation), etc.
  • The content handler is responsible for knowing its type and providing render, execution, etc. features as needed.
Pros
  • DB queries can become possible, so lots of obvious editing community-facing features become simple and fast to implement.
  • MW namespacing is well understood and provides for advanced user group permissions modelling (e.g. only sysops can change a test suite, newbies can create an implementation but not change an existing function defintion, logged out users can write documentation objects, etc.)
  • Probably-broken changes like making a documentation block be a test suite are now stopped automatically (with the ability to provide a workflow to convert if such things are needed).
  • Very easy to re-use existing MW features like CodeEditor or WhatLinksHere or whatever.
Cons
  • Lots more up-front PHP code (but it'll be pretty slim).
  • Flexibility in the system is more constrained.
  • Implicit system design decisions in how these things work that may push us in directions we're not explicitly intending.

Option 2b: ZObjects are differentiated at the MW level blobs, but also as a single JSON blob model?

As with option 2, plus:

  • Content handlers are also responsible for providing a consistent typed, tokenised JSON representation as per the paper.
Pros
  • Easy expansion of above.
Cons
  • I'm not clear on why this model is needed, so I may be missing a subtlty of what objective the paper's model is pushing us towards.

Event Timeline

Thanks for writing this up!

Regarding Option 2b) I would prefer if we could avoid providing two different formats (although that also has certain advantages), a JSON representation for the external world and a different representation that we use for storage.

My preference would be for Option 1, and trying to resolve the Cons that you mention. I think that the option for the community to create new types and make the system as self-hosting as possible is a strong reason for Option 1, as the creation of new types won't need new code but can be done completely on-wiki.

The other advantage is that by everything being ZObjects, we can create ZObjects on the fly, which could be of any type. So a function call could return a type, or a function definition, etc. If we have these being separate content objects, I am worried that we would lose that flexibility - so if we wanted to store the type List of Strings as a persistent ZObject, we couldn't write it as MakeTypedList(String) and store that, but would need to materialize the result of that - because a function call looks different than a type.

I think that one issue is that, if I parse this right, the assumption is that if we go for Option 2, we would have "pure-typed" objects. I.e. it would only be a bit of source code for an implementation, and thus we could use code-editor. Or it would only be documentation, and thus we could use the current system for wiki text, etc. But I was thinking of them still as complex objects, with fields, etc. The UX for the editing of individual fields would be something where reusing an existing UX like the CodeEditor would be helpful - but not for the top-level of the whole object. Because we still need to add metadata on the object.

So for me the question would be - can we, even with Option 1, circumnavigate the cons and how much work would that be?

My understanding is, that we can: we can in a post-save step index conditionally on the type, and send the appropriate updates to diverse several tables.

The other con, regarding the standard tools, I think that's true for Option 2 as well, as I argued above.

In option 1, we may introduce a secondary table (like wb_items_per_site) to store queryable data. Alternatively, them may be stored as page properties.

Another thought that I forgot to formulate earlier:

ZObject are composed of primitives and other ZObjects. And most of these ZObjects could also be individual persistent ZObjects, i.e. their own page in the wiki.

This seems to suggest to me that if we build the UI, validation, and storage equally composable, we'll have the easiest time to accommodate that composability. This way we make the fewest assumptions, and allow contributors to learn the fewest primitives. If, on the other side, as in Option 2, certain objects require certain privileges in the code, this may break composability, and, eventually, block certain use cases to be implemented.

Yes you should not assume a built-in type (such as string) have a constant ZID (for example, they may different between main Wikilambda and the test instance), but you should assume that they have the same JSON representation like {"type": "built-in type", "name": "string"}.

Jdforrester-WMF claimed this task.

In my view, this has been Resolved (we're going with option 1).