Page MenuHomePhabricator

Git Commit hook that adds a whole new file when a new version of schema is committed
Closed, ResolvedPublic13 Estimated Story Points

Description

Git Commit hook that adds a whole new file when a new version of schema is committed

Use Case:

It is hard to see diff in new schema versions if they get to be new a whole new file (like they are now in the event bus schemas). Idea: have a commit hook that after a change to the "main" schema file creates a new version if needed thus diffs are always "contextual" the the whole schema no newly created fields.

Event Timeline

Nuria triaged this task as Medium priority.Oct 11 2018, 9:43 PM
Nuria created this task.
Nuria renamed this task from Commit hook that adds a whole new file when a new version of schema is comitted to Git Commit hook that adds a whole new file when a new version of schema is committed.Oct 11 2018, 9:46 PM
Nuria updated the task description. (Show Details)
Nuria removed the point value for this task.Dec 5 2018, 10:16 PM

Ping @Ottomata @Pchelolo is this work we are iaming to do for next quarter (january onwards) as we are also working on dockerization?

This work can happen at anytime, I think the sooner the better. So, yes!

We need this stuff settled before we can ask analysts/product folks to use git schema repos.

Been experimenting a bit. I think this can be done with .gitattribute file filters. Idea:

We commit a .gitattributes file to the schema repo(s) like:

current.yaml filter=generate_versioned_schema

We also commit add a .gitconfig file like:

[filter "generate_versioned_schema"]
    clean = ./generate-schema.js %f

Then any user of the repository needs to run

git config --local include.path  ../.gitconfig

This will configure a generate-schema.js (or whatever) script to run after every git add ... of any file named 'current.yaml'. The script will be given the content of current.yaml on stdin, as well as the full path to current.yaml as the first argument (%f).

The script should do the following:

  • Get the latest committed version number and the latest schema content of the schema from git HEAD (not from local working checkout). This can be done with some git magic.
  • Dereference the current.yaml schema (this can be done later as part of T206824: Make it possible to use $ref in JSONSchemas)
  • Do any schema validation and/or convention enforcement we might want.
  • Compare dereferenced current.yaml schema with latest schema.
  • If different, use semver to get newly incremented minor version from latest version number.
  • Write dereferenced current.yaml schema content to new version .yaml.
  • Optional: write new version .json too?
  • Create extensionless symlink to new version, e.g. 1.2.0 => 1.2.0.yaml.
  • git add any newly generated files

This should allow us to automatically generate new versioned and dereferenced files, while only editing current.yaml. Edits that modfify current.yaml will result in a new minor version. Heckay! We could get fancy and check for backwards compatibility (only adding new option fields), and if the change does something different, we could generate a new major version.


Another completely different idea and direction would be to go the opposite direction. We could have a post checkout hook and script, that generates versioned and dereferenced files from current.yaml's git history. This would have to iterate over every git commit, check out and read all modified current.yaml files, read the $id field from the schema, and output a file named for the $id (which should have the schema version) with the dereferenced content of current.yaml at that commit.

This would rely on git for versioning of schemas, which is kind of nice, but would still give us static files to use in a schema registry service or when loading schemas in EventGate. This would give us a little more control over the versioning, as we'd manually update the $id field when we update the current.yaml schema file.

Ottomata set the point value for this task to 13.
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.

Hm, I had an other idea last night.

What if instead of relying on the previous existent version file to generate the next version number, we just get it from the $id field of the schema. So:

schemaVersionField = '$id';
schemaVersionRegex = /.*\/(\d+\.\d+\.\d+)$/;
  • Get version from new candidate schema (current.yaml)
  • Dereference candidate schema
  • Validate against our WMF convention CI schema
  • Generate schema version file.

We'd still git add all of the versioned files, but this is way simpler, and doesn't rely on git history to figure out the next version number. It does require that the schema version is extractible from the schema itself.

I think we can still trigger the script after a git add or git commit.

@Pchelolo, @Milimetric thoughts?

Moved work to https://github.com/ottomata/jsonschema-tools

This works! I'd like to implement dereferencing next, but that will be part of T206824: Make it possible to use $ref in JSONSchemas

I'll move this to gerrit soon, get some code review, and then we can start using it in mediawiki/event-schemas.

Change 523745 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/event-schemas@master] Use jsonschema-tools, add common schema, $ref it from test/event

https://gerrit.wikimedia.org/r/523745

Change 523745 merged by Ppchelko:
[mediawiki/event-schemas@master] Use jsonschema-tools, add common schema, $ref it from test/event

https://gerrit.wikimedia.org/r/523745

Since we merged use of this into mediawiki/event-schemas, I'm going to call this task done! We might still have some TODO work in the jsonschema-tools repo, but that can be tracked in other tasks.