Page MenuHomePhabricator

CI Support for Schema Registry
Open, NormalPublic21 Story Points

Event Timeline

Nuria triaged this task as Normal priority.Oct 11 2018, 10:03 PM
Nuria created this task.

CI for ensuring that schemas all have consistent meta field
CI for ensuring schema backwards compatibility
CI for schema linting, e.g. no camelCase, only snake_case, etc.

We actually have tests for all of these in the current event-schemas repo, but they're not perfect. What would I like to change:

CI for schema linting, e.g. no camelCase, only snake_case, etc.

This should become a special ESLint config or, more realistically, an ESLint plugin. There's no good plugin for ESLint YAML linting now, so we could either develop a solid general-purpose plugin and a set of rules for linting, or a specific schema linting plugin. There is though a JSON linting plugin, we might be able to convert YAML to JSON and have a set of rules for the JSON plugin.

CI for ensuring that schemas all have consistent meta field

We actually right now have a more powerful feature when we validate common properties by level. Every event must have a proper meta field. Every revision-related event must have revision-related fields etc. Right now it's achieved by copy-paste, but I would like to get rid of it. Given the full schema files will be generated by a pre-commit hook, we might want to utilize JSON-schema references and inheritance in the latest schema and expand them in the hook. It although raises questions on what to do when, for example, meta schema changes - do we create a new version of every schema? But backward compatibility and inclusion checks probably will require us to do so anyway.

CI for ensuring schema backwards compatibility

I have written a very naive schema backward compatibility checking code which I believe doesn't cover even half of the cases. There's no third-party library for checking JSON-schemas for compatibility (at least I couldn't find one) so I propose to actually write a proper module for that, publish it separately and use it here.

There is though a JSON linting plugin, we might be able to convert YAML to JSON and have a set of rules for the JSON plugin.

Perhaps the git hooks that generate the versioned files could render them as JSON instead of YAML anyway?

Ottomata moved this task from Backlog to Next Up on the EventBus board.Dec 5 2018, 10:04 PM
Ottomata removed Ottomata as the assignee of this task.Dec 5 2018, 10:09 PM
Nuria removed the point value for this task.Dec 5 2018, 10:16 PM
Ottomata assigned this task to Pchelolo.Jan 17 2019, 9:29 PM

There's no good plugin for ESLint YAML linting now

There is yamllint, but it is Python, not ESLint. I don't see rules in there that would allow us to enforce snake_case though.

There is though a JSON linting plugin

I found https://www.npmjs.com/package/eslint-plugin-json, but it doesn't sound that great. Is there a better one?

Petr and I met today to discuss. We decided that the idea of using a meta schema to validate that event schemas conform to our conventions would be overly complicated and confusing to implement. Perhaps draft-8 of jsonschema will give us a better way of doing this.

Instead, we will write a wikimedia npm test library that can be used in any of our event schema repositories. We'll eventually separate out the wikimedia specific bits (the test cases) from the test runner (e.g. finding schemas to test). There will be 2 main types of tests.


Schema Convention

This includes things like snake_case vs camelCase, that every schema has an $id and $schema field, validation of custom field annotations etc. This type of test will be run on discovered schema file, and operates on a single schema version at a time.

Schema Backwards Compatibility

All discovered schemas in a repository will be grouped into sets of schema title, schema major version. Each schema in each group will be checked for backwards compatibility. This will allow for occasional backwards incompatible changes by incrementing the major schema version.
@Pchelolo we probably should use the title and $id fields to do the grouping, not their filenames. Perhaps we can do some convention validation that each schema's file path corresponds to the title/$id version. For purposes of backwards compatibilty checking though, we should let the schemas themselves declare what the version is, and not rely on the filepath.


@Pchelolo will implement a schema test runner that given a schema base path, will find all schema files, group them by title,version, and then run a set of tests on each group. Once we have a test runner, we will both work on implementing specific test cases.

Likely the test runner and the test cases (and the backwards compatibility checker) should each be in separate npm libraries, so that we can easily set up new schema repositories to all use the same set of tests. Jenkins can then be configured to just run npm test as usual to do proper CI.

Ottomata moved this task from Next Up to In Progress on the EventBus board.Thu, Sep 5, 5:03 PM

Change 534702 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/event-schemas@master] Add tests to ensure proper repo structure.

https://gerrit.wikimedia.org/r/534702

Change 534702 merged by Ottomata:
[mediawiki/event-schemas@master] Add tests to ensure proper repo structure, compatibility and conventions.

https://gerrit.wikimedia.org/r/534702

Just pushed https://github.com/wikimedia/jsonschema-tools/pull/9 for exportable and reusable schema repository tests. Tests are based on Petr's work in https://gerrit.wikimedia.org/r/534702.

Thoughts:

  • Is it possible to automate setting up the eslint-plugin-json stuff from jsonschema-tools? Should we?
  • I think we need a way to disable testing of certain schemas and/or of certain tests for schemas. I don't think we can do it in the schema files, so it would have to be configuration provided to the exported tests. Perhaps we could just regex match against each schema's $id? E.g. schemasToSkip = [/mediawiki\/recentchange/] ? That would disable all tests for a specific schema / version, but wouldn't allow us to disable specific tests, e.g. disable-monomorphic-type-test...unless we built some structure for naming and disabling test cases like eslint has. Sounds like a lot.

I think we need a way to disable testing of certain schemas and/or of certain tests for schemas. I don't think we can do it in the schema files, so it would have to be configuration provided to the exported tests. Perhaps we could just regex match against each schema's $id? E.g. schemasToSkip = [/mediawiki\/recentchange/] ? That would disable all tests for a specific schema / version, but wouldn't allow us to disable specific tests, e.g. disable-monomorphic-type-test...unless we built some structure for naming and disabling test cases like eslint has. Sounds like a lot.

If we do make such a config, I believe we need to do the following

  1. Each test name should be prefixed with some constant, like 'structure-current-must-exist'. We can even export those constants as variables to make sure typos can not happen.
  2. The config probably should be called ignoredTests and have a format schema_id => config with a config being a map(array?) of constants from point 1...

YAML allows comments, but no all the tests are based on the YAML files, so putting comments into files themselves like eslint won't work. Alternatively, we can put an rc file into a schema directory with something like

disable-structure-current-must-exist: true
disable-backwards-compatibility: [ 1.0.1, 2.0.3 ]

However, I'm not sure if we need to do all this right now.. what specific tests to disable do you have in mind?

The only one we need to disable right now is the monomorphic type test for recentchange, right?

The only one we need to disable right now is the monomorphic type test for recentchange, right?

Oh right indeed I forgot about that one..

Maybe the right answer is to fix recentchange...investigating.

I modified the task description of T216567: mediawiki/recentchange event should not use fields with polymorphic types. I think we should just fix recentchange and not worry about implementing test skipping (for now at least).

Change 537549 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/event-schemas@master] Use schema repository tests from jsonschema-tools 0.4.0

https://gerrit.wikimedia.org/r/537549

Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 537549 merged by Ppchelko:
[mediawiki/event-schemas@master] Use schema repository tests from jsonschema-tools

https://gerrit.wikimedia.org/r/537549