Page MenuHomePhabricator

Support required properties in schemas
Closed, ResolvedPublic

Description

The outcome of T359193 is to support required in JSON schemas but loosening the validation step to not take it in account for the top level properties of an schema while reading a configuration. But using it regularly on API submissions and writes in general. This is to circumvent the indirection of a configuration not being available at a given time and making the system unrecoverable from that situation.

Acceptance criteria:

  • In the schema, we can define any property as required. including properties at the top-level
  • The server will not write a schema where required properties are missing
    • However, the server will write a required property if it is the empty string (""), null, 0, or false, if these values are compatible with its type
  • The server will load a schema where a required property is missing
  • In the form generated by CommunityConfiguration, required properties must have a meaningful value for the form to be submittable, that is, the empty string is not valid, whereas 0 or false might be, depending on the control
  • In the form generated by CommunityConfiguration, the validation for required properties actually having a value is performed not only before submitting the form but also as the user interacts with form elements (purposefully somewhat vague: for some elements it might be better to do on each keystroke, for others only when the user moves to another element, and there might be other circumstances)

Related Objects

Event Timeline

Sgs raised the priority of this task from High to Needs Triage.Mar 25 2024, 5:19 PM
Sgs added a project: Growth-Team.
Sgs moved this task from Inbox to Up Next on the Growth-Team board.

Un-assigning since I am not actively working on the task. Probably needs refinement of acceptance criteria.

Sgs removed Sgs as the assignee of this task.May 16 2024, 2:08 PM

Change #1034451 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/CommunityConfiguration@master] [WIP] Editor: support for required in nested objects

https://gerrit.wikimedia.org/r/1034451

Change #1057219 had a related patch set uploaded (by Michael GroรŸe; author: Michael GroรŸe):

[mediawiki/extensions/CommunityConfiguration@master] feat(editor): support required for String and PageTitle controls

https://gerrit.wikimedia.org/r/1057219

Change #1034451 abandoned by Sergio Gimeno:

[mediawiki/extensions/CommunityConfiguration@master] [WIP] Editor: support for required in nested objects

Reason:

Not valid

https://gerrit.wikimedia.org/r/1034451

In this task we need to fix two things:

  1. the backend problem of how to set required for top-level properties in a schema (it works already straight-forward for nested ones)
  2. the client behavior when a property is required.

Focusing on the required for top-level properties:
The challenge is that currently all array-constants in the schema class are interpreted as a property. So I see two options:

  1. add special handling for a specific constant holding the list of required top-level properties
    1. which constant name? 'required', '__REQUIRED_TOP_LEVEL', ...?
    2. the constant name needs to be decided now and needs to be stable, the specific nature of the special handling is internal and can be refactored over time
  2. add SELF::REQUIRED => true to the required properties and then reassemble those back into an array in JsonSchemaBuilder::getRootSchema

Both approaches should be reasonably simple to implement and I don't see performance issues for either.

Option (1) implies special casing required in ReflectionSchemaSource afaiu, which prevents to have an schema with a property named required, not sure how bad this is. Option (2) seems simpler as in not modifying ReflectionSchemaSource::loadAsComponents which is what MainConfig uses. But I'm doubting where this logic should belong to. On one side ReflectionSchemaSource::loadAsSchema seems the simpler option, but maybe we're pushing schema source to be too json-ish already. @daniel maybe have useful feedback here.

Change #1059149 had a related patch set uploaded (by Michael GroรŸe; author: Michael GroรŸe):

[mediawiki/extensions/CommunityConfiguration@master] feature: allow designating top-level properties as required

https://gerrit.wikimedia.org/r/1059149

Thank you for your thoughts!

We can also make the special handling work in CommunityConfiguration alone, see for example feature: allow designating top-level properties as required. But upstreaming this to ReflectionSchemaSource would indeed be a reasonable option too. Also, if we upstream it, then aside from the option to special-case required, we could also special-case "every constant starting with __".
OTOH, upstreaming it to ReflectionSchemaSource would imply an extra if-condition for every constant on every schema. While individually these are negligible, over all schemas, they add up. The GE only approach does not have those issues.
I, too, would be curious if @daniel has any thoughts.

I think Option 2, reassembling from required-attributes on the top-level fields, is also perfectly fine, with the tradeoff that we need to document + communicate to other reusers that they
need to add 'required' => true for the top-level properties, but a 'required' => [...] for nested objects and arrays. But that should be doable.

Change #1057903 had a related patch set uploaded (by Michael GroรŸe; author: Michael GroรŸe):

[mediawiki/extensions/CommunityConfiguration@master] refactor(editor): extract validation errors into composable

https://gerrit.wikimedia.org/r/1057903

Change #1057903 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] refactor(editor): extract validation errors into composable

https://gerrit.wikimedia.org/r/1057903

Change #1057219 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] feat(editor): support native validation for String and Number controls

https://gerrit.wikimedia.org/r/1057219

First of all, thanks @Michael for driving this forward and writing a patch for both approaches. I'm going to split my comment in two parts โ€“ in the first one, I am going to recapitulate the problem and the decision to be made. In the second one, I'll analyse both options and make a recommendation on what approach we should take here.

Recap

As a reminder, whatever decision we take here only affects the top level โ€“ for required under that level, this already works out of the box. The two approaches @Michael suggested are:

  1. Array-based approach (patch): Each PHP schema will have a certain constant reserved to include the list of required properties. The schema builder would look the constant up and include it in the resulting JSON schema.
  2. Boolean-based approach (patch): In each property, it would be possible to declare self::REQUIRED => true to make that property required. The JSON schema builder would look up that declaration as walking across the properties, and it would build the resulting required array in the JSON schema.

At this point, we need to decide which of those options we should take.

Analysis & Recommendation

Implementation-wise, both options are similarly complex. Both involve adding a little bit of computation into the JSON schema builder. At the end of the day, both have the same result: JSON schema has a clear definition of how required is defined, and we can only follow that definition. The only room for changes is in our own PHP schemas. Even though the boolean approach is slightly more expensive than the array-based approach (boolean approach requires a loop, while array approach does not), both are similarly good with regards to performance, reliability and other aspects. For that reason, I am going to look at other aspects here.

Visually, the boolean-based approach seems like a slightly better one. Besides required, all other options that affect a property are defined under that particular property. For example, if I want to define that a property foo is an integer, I'd do that by declaring self::TYPE => self::TYPE_INTEGER directly under that property, rather than by including the property in a __TYPE_INTEGER array (similar to what the array-based approach proposes for required). However, if we take a look at how required already works for properties outside of the top level, they already follow the array-based approach. I do not see any arguments to do the same thing differently in the two places it is needed.

Hence, I recommend we adopt the array based approach and merge patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CommunityConfiguration/+/1063158.

@Michael @Sgs If you have any concerns about this, or if I am missing something in the comment above, please do feel free to speak up. For now, I marked both patches with a -2, to signalise the decision is yet to be finalised.

Thank you, @Urbanecm_WMF, for the writeup! While I slight favor the array based approach as well, I feel I need to point out that at least the underlying library justinrainbow/json-schema fully supports both ways of specifying required, both on the top level and for nested declarations.
The historic background, as I understand it, is that the "boolean"-way of specifying "required": true was the way to do it in json-schema Draft 3, and the array-based way is what is defined in json-schema Draft 4.

We already have support for the array-based way in our frontend, but that would not be hard to change to the boolean-based way.

I think I favor the array-based way because it might be a bit more future-proof and more internally consistent, but I'm not fully sure about that myself. I do not know what guarantees justinrainbow/json-schema gives for supporting the "Draft 3"-way of doing things.

The array based approach looks to me like the best fit based on the arguments already expressed. Specially consistency with non top level definitions and newer json specifications.

I think I favor the array-based way because it might be a bit more future-proof and more internally consistent, but I'm not fully sure about that myself. I do not know what guarantees justinrainbow/json-schema gives for supporting the "Draft 3"-way of doing things.

fwiw, the lock to use justinrainbow/json-schema-Draft-04 was intentional at some point to avoid conflicting PHP library versions in MW servers but the general sentiment is that we would like to move to Opis and newer json schema specifications (the reason we did not is Opis is locked on 1.X due to the Wikilambda usage, and related T319054) . Using the Draft-3 way could be confusing as the rest of the SPI relies on Draft-4 spec, and the schema transform seems less expected.

As a reminder, whatever decision we take here only affects the top level โ€“ for required under that level, this already works out of the box. The two approaches @Michael suggested are:

I'm confused on the different expectation for top-level properties and non top level. this already works out of the box is ambiguous on "what" works, and also we don't have any documentation around JsonSchema::REQUIRED in JSON-schema_vocabulary. I'd suggest to not create different "APIs" for top level and nested properties as much as possible. I can't find usages of JsonSchema::REQUIRED in any schema from known consumers (lookup 1, lookup 2 ) so we're still in time of changing the type and intent for it.

My suggestion would be to turn JsonSchema::REQUIRED into an array and drop any boolean usage of it. Turning the array of property names into individual required: true properties should be easy at the time of building the UI schema in the client.

I'm confused on the different expectation for top-level properties and non top level. this already works out of the box is ambiguous on "what" works

Sorry for the confusion. I meant that there is a way to trigger required for non-top level fields (namely, by following the array based system), and that the scope of this task is not to rewrite how that part of the system works (we might agree on doing that, but it's not a part of the decision we're discussing about).

I can't find usages of JsonSchema::REQUIRED in any schema from known consumers (lookup 1, lookup 2 ) so we're still in time of changing the type and intent for it.

FWIW, there is an in-flight patch to change that, see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1057222. Granted, that one is not a top-level one.

As far as I can see, for various reasons, we all favor the array-based approach. Then I think that this can move forward once the minor discussion on the change itself is resolved. In addition to that, I can start drafting an ADR to document that decision.

Change #1071642 had a related patch set uploaded (by Michael GroรŸe; author: Michael GroรŸe):

[mediawiki/extensions/CommunityConfiguration@master] docs: Add ADR for how to define top-level `required` in schemas

https://gerrit.wikimedia.org/r/1071642

Change #1059149 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] feat(SchemaBuilder): array-based `required` for top-level properties

https://gerrit.wikimedia.org/r/1059149

Change #1071642 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] docs: Add ADR for how to define top-level `required` in schemas

https://gerrit.wikimedia.org/r/1071642