|Open||Ottomata||T185233 Modern Event Platform (TEC2)|
|Open||Ottomata||T201063 Modern Event Platform: Schema Registry|
|Open||Ottomata||T206789 Modern Event Platform: Schema Registry: Implementation|
|Open||Pchelolo||T206814 CI Support for Schema Registry|
|Resolved||Pchelolo||T206889 Develop a library for JSON schema backwards incompatibility detection|
|Resolved||Ottomata||T232144 Migrate all event-schemas schemas to current.yaml and materialize with jsonschema-tools and remove old schemas|
|Open||None||T216567 mediawiki/recentchange event should not use fields with polymorphic types|
We actually have tests for all of these in the current event-schemas repo, but they're not perfect. What would I like to change:
CI for schema linting, e.g. no camelCase, only snake_case, etc.
This should become a special ESLint config or, more realistically, an ESLint plugin. There's no good plugin for ESLint YAML linting now, so we could either develop a solid general-purpose plugin and a set of rules for linting, or a specific schema linting plugin. There is though a JSON linting plugin, we might be able to convert YAML to JSON and have a set of rules for the JSON plugin.
CI for ensuring that schemas all have consistent meta field
We actually right now have a more powerful feature when we validate common properties by level. Every event must have a proper meta field. Every revision-related event must have revision-related fields etc. Right now it's achieved by copy-paste, but I would like to get rid of it. Given the full schema files will be generated by a pre-commit hook, we might want to utilize JSON-schema references and inheritance in the latest schema and expand them in the hook. It although raises questions on what to do when, for example, meta schema changes - do we create a new version of every schema? But backward compatibility and inclusion checks probably will require us to do so anyway.
CI for ensuring schema backwards compatibility
I have written a very naive schema backward compatibility checking code which I believe doesn't cover even half of the cases. There's no third-party library for checking JSON-schemas for compatibility (at least I couldn't find one) so I propose to actually write a proper module for that, publish it separately and use it here.
There is though a JSON linting plugin, we might be able to convert YAML to JSON and have a set of rules for the JSON plugin.
Perhaps the git hooks that generate the versioned files could render them as JSON instead of YAML anyway?
There's no good plugin for ESLint YAML linting now
There is yamllint, but it is Python, not ESLint. I don't see rules in there that would allow us to enforce snake_case though.
There is though a JSON linting plugin
I found https://www.npmjs.com/package/eslint-plugin-json, but it doesn't sound that great. Is there a better one?
Petr and I met today to discuss. We decided that the idea of using a meta schema to validate that event schemas conform to our conventions would be overly complicated and confusing to implement. Perhaps draft-8 of jsonschema will give us a better way of doing this.
Instead, we will write a wikimedia npm test library that can be used in any of our event schema repositories. We'll eventually separate out the wikimedia specific bits (the test cases) from the test runner (e.g. finding schemas to test). There will be 2 main types of tests.
This includes things like snake_case vs camelCase, that every schema has an $id and $schema field, validation of custom field annotations etc. This type of test will be run on discovered schema file, and operates on a single schema version at a time.
Schema Backwards Compatibility
All discovered schemas in a repository will be grouped into sets of schema title, schema major version. Each schema in each group will be checked for backwards compatibility. This will allow for occasional backwards incompatible changes by incrementing the major schema version.
@Pchelolo we probably should use the title and $id fields to do the grouping, not their filenames. Perhaps we can do some convention validation that each schema's file path corresponds to the title/$id version. For purposes of backwards compatibilty checking though, we should let the schemas themselves declare what the version is, and not rely on the filepath.
@Pchelolo will implement a schema test runner that given a schema base path, will find all schema files, group them by title,version, and then run a set of tests on each group. Once we have a test runner, we will both work on implementing specific test cases.
Likely the test runner and the test cases (and the backwards compatibility checker) should each be in separate npm libraries, so that we can easily set up new schema repositories to all use the same set of tests. Jenkins can then be configured to just run npm test as usual to do proper CI.
Just pushed https://github.com/wikimedia/jsonschema-tools/pull/9 for exportable and reusable schema repository tests. Tests are based on Petr's work in https://gerrit.wikimedia.org/r/534702.
- Is it possible to automate setting up the eslint-plugin-json stuff from jsonschema-tools? Should we?
- I think we need a way to disable testing of certain schemas and/or of certain tests for schemas. I don't think we can do it in the schema files, so it would have to be configuration provided to the exported tests. Perhaps we could just regex match against each schema's $id? E.g. schemasToSkip = [/mediawiki\/recentchange/] ? That would disable all tests for a specific schema / version, but wouldn't allow us to disable specific tests, e.g. disable-monomorphic-type-test...unless we built some structure for naming and disabling test cases like eslint has. Sounds like a lot.
I think we need a way to disable testing of certain schemas and/or of certain tests for schemas. I don't think we can do it in the schema files, so it would have to be configuration provided to the exported tests. Perhaps we could just regex match against each schema's $id? E.g. schemasToSkip = [/mediawiki\/recentchange/] ? That would disable all tests for a specific schema / version, but wouldn't allow us to disable specific tests, e.g. disable-monomorphic-type-test...unless we built some structure for naming and disabling test cases like eslint has. Sounds like a lot.
If we do make such a config, I believe we need to do the following
- Each test name should be prefixed with some constant, like 'structure-current-must-exist'. We can even export those constants as variables to make sure typos can not happen.
- The config probably should be called ignoredTests and have a format schema_id => config with a config being a map(array?) of constants from point 1...
YAML allows comments, but no all the tests are based on the YAML files, so putting comments into files themselves like eslint won't work. Alternatively, we can put an rc file into a schema directory with something like
disable-structure-current-must-exist: true disable-backwards-compatibility: [ 1.0.1, 2.0.3 ]
However, I'm not sure if we need to do all this right now.. what specific tests to disable do you have in mind?
I modified the task description of T216567: mediawiki/recentchange event should not use fields with polymorphic types. I think we should just fix recentchange and not worry about implementing test skipping (for now at least).