Page MenuHomePhabricator

Community configuration: Decide whether the schema PHP classes need support for `required`
Closed, ResolvedPublic3 Estimated Story Points

Description

@Sgs and @Urbanecm_WMF discussed the T358799: CommunityConfiguration: Switch to generating schemas from PHP classes rather than committing them as JSON files directly patches during the synchronous code review meeting. Within that task, we're shifting from storing JSON schemas as JSON files on the disk to representing JSON schemas as PHP classes. This means we'll be supporting a certain subset of the JSON Schema standard, as defined us (this is to make the schemas more predictable, and done as a response to feedback from MediaWiki team's Principal Engineers).

It is questionable whether we need to mark top-level keys in the JSON schema as required. On one hand, it might make sense to ensure a given feature is actually configured within CommunityConfiguration. On the other hand, there are a couple of arguments against declaring fields as required:

  • We're unable to guarantee configuration will be always available (the store might be temporarily unavailable for an outage, or it might be corrupted and present invalid value, etc.), so client extensions will need to handle missing configuration, even for fields marked as required (in other words, we always provide either valid configuration or nothing at all, but we're unable to guarantee we'll always return a valid configuration). This decreases the usefulness of declaring something as required.
  • Upon deployment, configuration pages will not exist. Those non-existing pages will be likely unable to provide any values marked as required. Same applies for schema changes, although that could be handleable by a migration logic.

@Sgs also raised the question of declaring fields as required for objects that are not at the top-level. While the arguments in the list above are partially valid as well, their validity is somewhat limited (we might still find ourself unable to provide any community configuration, but that would also make the top-level key unavailable as well). For that reason, we might want to behave differently for top-level and non-top level keys.

Within this task, we should decide on the right approach towards the required keys and if needed, fill in new tasks to implement any changes of behaviour.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
KStoller-WMF set the point value for this task to 3.Mar 5 2024, 5:12 PM

I find "required" a nice JSON schema feature we should try to support in the most predictable way. Even if we decided we cannot ensure required properties at the top level because of the technical problems exposed, we probably would want to support them at deeper levels (one level of support so far).

Looking closer how to deal with the top-level issue, schema validity is checked in the following scenarios:

  1. In the onJsonValidateSave hook when directly using the config page editor interface
  2. In the ApiEdit when handling updates (eg. from the editor client)
  3. In the DataProvider when requesting to load a valid configuration, used by WikiPageConfigProvider

For all cases top level keys being present is relevant if there's no default value we can use. But particularly for cases (1) and (2) I find it more relevant since a missing property leads to removing it or overwriting it with a default which results in some kind of information loss. In case (3) missing required properties could lead to an invalid config (maybe recoverable if merged with defaults) and that can cause some application errors but not as bad as for prior cases.

If we agree in these observations, an approach we could take is to allow required properties at the top-level but only use them for cases (1) and (2). For (3) we could check which top level config options we were able to retrieve and subtract the missing ones from the required top-level ones. @Urbanecm_WMF what do you think? I'm happy to make the changes to support "required" if this sounds reasonable.

Hello @Sgs, thanks for taking a look at this. Generally speaking, I agree with your observations. If we only check required on save (and not in the before-reading check in DataProvider), I have no concerns about it being introduced.

This approach is similar to the concept of validation warnings (blocking saving, but not reading) and validation errors (blocking everything), which we introduced in GrowthExperiments. Back then, it was done as part of T312576: Structured mentor list should validate the maximum length of a mentor message (see patch for additional context). This makes me wonder: How should we implement the validation warning system in CommunityConfiguration, now that we have fully-blown JSON Schemas? Should we have a generic concept to mark certain validation rules as "soft" (ignorable on reads, but enforced on writes)? If so, is there a reasonable JSONSchema extension we could use/create to record this sort of decisions within the schema itself? How can we make ReflectionSchemaSource read declarations that are CommunityConfiguration-specific? Is that even a good idea?

Considering the required question is now resolved, we can probably close this task as resolved (and fill another one for implementing the changes themselves). I'm not doing that right now, as I see you moved it back to Doing and to give the opportunity to discuss the questions I raised above.

Considering the required question is now resolved, we can probably close this task as resolved (and fill another one for implementing the changes themselves). I'm not doing that right now, as I see you moved it back to Doing and to give the opportunity to discuss the questions I raised above.

Closing this based on comments. Filed T360921 to discuss implementation.