======= Ticket proliferation disambiguration!
- {T185233} is the overall Modern Event Platform parent ticket.
- {T201063} is the parent Event Schema Registry task, it describes high level requirements/user stories of this component.
- {T201643} is the RFC ticket, it will hopefully be closed once the RFC process finishes.
This ticket will be used to track and task implementation work for the Schema Registry.
== Description
Since we are moving forward with git as the canonical storage of schemas, we can base implementation to be done for Q2 2018-2019 on the existing [[ https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/event-schemas | event-schemas repository ]]. This repository currently contains Draft 4 JSON schemas with some minimal CI jobs to ensure schema consistency. Implementation work for this task will mostly be around git commit/merge hooks and CI improvements.
We also may want to build an HTTP service to serve schemas. If so, this service might be as simple as just an HTTP file server that exposes the git repository (or repositories) hierarchy and schemas.
In either case, schemas will always be addressable via URIs, whether those schemas are checked out on the local filesystem (`file://`) or via HTTP (`http://`).
== Technical Requirements
- Up to date JSONSchema support (Draft 7?)
- All schema versions maintained in HEAD commit (we won't be using git history to version schemas)
- CI for ensuring schema backwards compatibility
- CI for schema linting, e.g. no camelCase, only snake_case, etc.
- CI for schema field annotations (`dimension` vs `measure`, PII, etc.)
- 'latest' schema version is editable and changes to it are reviewable using usual git review tools - T206812
- Post commit or merge git hooks to create new versioned file copies of schemas - T206812
- Schemas can be in YAML or JSON format, but files should not have file extensions so relative schema_uris don't need to include (or append) a proper file extension - T206812
== Other ideas
On 2018-10-12, @Pchelolo and @Ottomata brainstormed implementation ideas. Much of the implementation work to be done is around CI and development workflows. Some of this is already done for mediawiki/event-schemas, but we need to do more. I'll try and collect some of the things we need to implement.
- editing of schemas should be done to the `current` schema version.
- JSON $ref pointers can be used only in the `current` schema version.
- $ref pointers to other schemas must be strongly versioned. E.g. if we factor out the `meta` schema,
- every event that uses it will point to a specific version of `meta`, e.g. meta/3, or meta/4.
-- versioned $ref pointers in schemas must be manually upgraded by editing the schema and creating a new schema version.
- This will ensure that any changes to referenced schemas will not affect user schemas until they manually update the referenced version. (This is how dependencies normally work anyway.)
- git hooks will dereference `current` to generate standalone explicitly committed versioned schema files.
- next schema version number can be computed from upstream branch
-- e.g. if upstream origin/master has revision/create/3 as the latest, a change to revision/create/current will generate revision/create/4 for review. If local checkout of master has revision/create/4, but upstream origin/master still only has revision/create/4, a change to revision/create/current will regenerate revision/create/4.
- if only a code comment or `description` field change in `current` schema, don't generate a new schema version.
- backwards compatibility library T206889 ensure changes are backwards compatible in git hook and also CI.
- Should we use smarter versioning than just incrementing numbers? Semver might be nice and more flexible, especially for those times when we need to force a backwards incompatible change.