Page MenuHomePhabricator

Create a system to support developers while upgrading configuration schemas
Open, HighPublic5 Estimated Story Points

Description

Community configuration 2.0 makes it possible for developers to specify configuration schemas, which include certain configuration variables and the expected format for each. Occasionally, the expected format needs to change. In the Community configuration 1.0 implementation, this requires a developer to manually ensure each configuration file continues to be valid. However, this is not really sustainable.

Within this task, we should create a system that would make it possible for developers to easily migrate between schema versions. This script would (somehow) find PHP callables that can migrate the configuration values from one format to another. Completing T351232: Community configuration 2.0: Consider generating JSONSchema from PHP classes rather than committing them directly before (where callables are easier to register than in raw JSON) might be helpful.

Implementation Plan
  • T362042: Community configuration: Introduce validation warnings
  • Version the schemas internally, so that it is possible to access older version of the schema within PHP
    • This can be a Migrations subfolder to the Schemas folder, with files such as MentorshipSchema_1_0_0.php, which would have the migration. The goal here is to be able to validate config files against a specific version while running the migration (so that invalid migration commands are detected, rather than saved through).
    • There would need to be a script to be executed every time someone modifies a schema, which would update the Migrations subfolder. There also would likely need to be a CI structure test that'd ensure the command is being executed.
    • This includes storing the schema version within the configuration store.
  • Only allow changes that do not break backwards or forwards compatibility (as to schema validity; it would be the client extension's responsibility to not break on train deployments/rollbacks as to how the config is being used), so that config files don't fully break even when train moves forward or rollbacks. With the validation warnings mentioned above, this would restrict ourselves to: adding a new config variable, removing a config variable and similar changes.
  • Store migration callbacks within the schema PHP class, which would accept a stdClass (with the config data conforming to a specific schema version), migrate it to a newer (older) version of the schema and return it back.
  • Create a maintenance script which would convert a given provider to a specified schema version (or the newest schema, if instructed).
Acceptance Criteria
  • Maintenance script to trigger the migration system
  • Technical documentation
  • Exploration for how we should document previous schemas

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
KStoller-WMF subscribed.

Looks like this isn't ready to estimate yet, so moving to Needs Discussion until Acceptance Criteria is added.

Some thoughts on the topic of schema migration (without having looked at the code structure):

  • The JSON should contain a version marker in a meta-field - we could include a full schema URI in "$schema", or use something like "$config-version".
  • The provider interface gets a getSchemaConverter() method that takes a source version and target version.
  • The provider interface gets a loadSchema() method that takes an optional version parameter.
  • The schema class defined a constant that specified the version (and/or schema URI).

Dopes this sound like a reasonable approach?

Does this sound like a reasonable approach?

In general, yes. However...

  • The provider interface gets a getSchemaConverter() method that takes a source version and target version.

...my main question is how would the schema converter do the conversion between schemas. I was thinking about having an upgrade/downgrade callable somewhere and call it from the converter, but I'm unsure on what that somewhere is. Do you have any thoughts on that, please?

...my main question is how would the schema converter do the conversion between schemas. I was thinking about having an upgrade/downgrade callable somewhere and call it from the converter, but I'm unsure on what that somewhere is. Do you have any thoughts on that, please?

Why a callable? I was thinking we can have SchemaConverter interface, that defined a method like convert( array $data ): array. For each conversion, we'd have a separate implementation of this interface. The provder would have getSchemaConverter( string $source, string $target ): SchemaConverter which would return an appropriate implementation as needed.

I think this is more straight forwards than using callbacks and static methods with a generic SchemaConverter. Schema conversion may sometimes involve data conversion, which may depend on configuration. So it can't always be static.

Does this seem too cumbersome to you?

Why a callable? I was thinking we can have SchemaConverter interface, that defined a method like convert( array $data ): array. For each conversion, we'd have a separate implementation of this interface. The provder would have getSchemaConverter( string $source, string $target ): SchemaConverter which would return an appropriate implementation as needed.

It looks like we're thinking about a very similar thing, "just" putting it to a different place. I was thinking of somehow getting toPreviousSchemaVersion( stdClass $data ) [1] and toNewerSchemaVersion( stdClass $data ) somewhere, possibly to the schema files themselves. It looks like you're thinking about a similar thing, except as a separate class. In the remainder of my post, I'll refer to this code as the converter (regardless of where it ends up living).

I'm fine with either solution for the converter itself – what I'm unsure about is how the right converter(s) would be invoked. In other words, how would getSchemaConverter ( string $source, string $target ): SchemaConverter find the right implementation to return (or with the originally suggested approach, how would a generic SchemaConverter find & call the right callables in the right order).

Possibly, we can do that on the schema level via well-known constants, and do public const CONVERTER_{UP/DOWN} = SchemaConverterImplementation::class to define the schema converter implementation (or with the originally suggested approach, an abstract method that the schema would implement). That way, we would only need to order the schema versions by their version ID, and call the right converters that way.

Alternatively, we can merely provide a solution to register converters, and then it'd be something to be resolved on the provider's side. I don't really like that approach (it's going to be a common problem to resolve), but maybe that's what we need to do?

[1] stdClass, as in PHP, [] can be either an empty array (represented as [] in JSON) or an empty dictionary (=empty object in JSON, represented as {}), and there is no way to tell those two things apart. For that reason, we use stdClass to represent JSON objects and [...] to represent JSON lists.

Schema conversion may sometimes involve data conversion, which may depend on configuration. So it can't always be static.

Would you mind clarifying this part, please? In my opinion, schema conversion will definitely depend on the community configuration data (which will be available as a parameter) and on the source/target schema (which would be known in advance), but is there a case where it might depend on the global/LocalSettings-provided configuration?

Change #1018773 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] WIP: Introduce JsonSchemaReader

https://gerrit.wikimedia.org/r/1018773

KStoller-WMF moved this task from Needs Discussion to Up Next on the Growth-Team board.

Change #1018736 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] [refactor] Use descriptive property names in JsonSchemaForTesting

https://gerrit.wikimedia.org/r/1018736

Change #1019046 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] Add IValidator::areSchemasSupported()

https://gerrit.wikimedia.org/r/1019046

Change #1018774 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] PoC: Expose older schema versions

https://gerrit.wikimedia.org/r/1018774

Change #1019056 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] PoC: Add SchemaMigrator

https://gerrit.wikimedia.org/r/1019056

Thank you @daniel again for the advice here. I spent the last couple of days prototyping around schema migrations. If you have a while to take a look at https://gerrit.wikimedia.org/r/1019056, this would be heavily appreciated.

Change #1018736 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] [refactor] Use descriptive property names in JsonSchemaForTesting

https://gerrit.wikimedia.org/r/1018736

KStoller-WMF set the point value for this task to 5.

Change #1021411 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CommunityConfiguration@master] Introduce AbstractProvider::storeConfiguration

https://gerrit.wikimedia.org/r/1021411

Change #1018773 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] [refactor] Introduce JsonSchemaReader

https://gerrit.wikimedia.org/r/1018773

Change #1019046 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] Add IValidator::areSchemasSupported()

https://gerrit.wikimedia.org/r/1019046

Change #1021411 merged by jenkins-bot:

[mediawiki/extensions/CommunityConfiguration@master] Introduce AbstractProvider::storeConfiguration

https://gerrit.wikimedia.org/r/1021411