Page MenuHomePhabricator

Add CI step to event schema repositories to test to fail if a schema is deleted
Closed, ResolvedPublic

Description

In https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary/-/merge_requests/13, a schema was deleted before its stream config was removed. This caused canary events and Hive ingestion jobs to fail, as they tried to look up the schema for the stream.

Our documentation states

A schema should never be deleted, but all of the stream related code and configuration can be removed at anytime to stop producing an event stream

We should add a test to jsonschema-tools, or just to gitlab CI, that fails if schemas are deleted.

This might be difficult to do in jsonschema-tools, as it would need to inspect git history.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
CI/CD Check to prevent schema deletionsrepos/data-engineering/schemas-event-primary!27javiermontonfeature/prevent-schema-deletionsmaster
[T377023] - Running prevent_schema_deletions on merge requests.repos/data-engineering/jsonschema-tools!63javiermontonfeature/schema-deletion-on-mrmaster
Use schema-deletion CI Componentrepos/data-engineering/schemas-event-secondary!96javiermontonfeature/prevent-schema-deletions-from-componentmaster
[T377023] - CI/CD Component to prevent schema deletionsrepos/data-engineering/jsonschema-tools!62javiermontonfeature/prevent-schema-deletion-componentmaster
[WIP][T377023] - New CI step to prevent schema deletionsrepos/data-engineering/schemas-event-secondary!91javiermontonfeature/ci-prevent-schema-deletionsmaster
Customize query in GitLab

Event Timeline

+1 to do it at Gitlab CI time, as it would be a proactive, rather than reactive, measure.

@JMonton-WMF thanks for the patch!

Q: did you explore the possibility of adding this test to the regular jsonschema-tools test suite? If possible, I guess we'd want to add it to the 'compatibility' tests?

https://gitlab.wikimedia.org/repos/data-engineering/jsonschema-tools/-/tree/master/lib/tests?ref_type=heads

I didn't know how that worked. Let me review it and I'll try to move the test there! I'm guessing this is a way of having the same tests in both primary and secondary repositories, right?

Ya exactly.

E.g.

https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-primary/-/blob/master/test/jsonschema/repository.test.js?ref_type=heads

and

https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary/-/blob/master/test/jsonschema/repository.test.js?ref_type=heads

If we can do it this way, then we should also add the ability to skip your new test via jsonschema-tools skipSchemaTestCases config, like here. There are probably cases when we DO want to delete schemas, we just want CI to yell about it unless we are really sure.

Some days ago I created a new CI check in a MR to check for deletions, but @Ottomata suggested that it could be moved to the jsonschema-tools repository.
The issue with this MR is that it should be created on every repository using json-schemas.

I started implementing this in the compatibility checks of jsonschema-tools, but I’m not completely convinced about the approach. At first glance, it makes sense to keep all checks within jsonschema-tools, but the “deletion check” feels different. Unlike other checks, it can’t be validated against the current code alone.

The only way I can find to to detect a deleted schema is by relying on Git, and I’m not completely comfortable adding Git-dependent logic inside the library, since that goes beyond its scope. In practice, the check would be based on Git status rather than code status, which doesn’t seem like the library’s responsibility.

For example, testing against HEAD won't work, it still passes the check if a deleted file is committed on a branch. To truly confirm deletion, we’d need to compare against the default branch (origin/master). In my opinion, that kind of check seems better suited for the CI, where Git context and branch changes are already managed, rather than inside the code itself.

In other words, you could use the library in a place without Git, and it will behave differently.

I'd like to get some opinions on this, as I'm not sure if similar checks are done in other places. Any suggestions @tchin @xcollazo ? Do you think the check should be in jsonschema-tools, in the CI only, or any other approach?

Thanks!

I agree this particular check should not be a concern of jsonschema-tools.

+1 to Gitlab CI failing the pipeline if this is violated. We should also be able to override it if, for some reason, we do want to delete.

In aims to move this check to the CI, but keeping it standard from the jsonschema-tools repository, I'd like to go for this proposal:

  • A GitLab CI/CD reusable Component that checks if schemas are deleted in a particular folder against a branch (default branch by default): MR here.
  • The use of this Component on other repositories, like schemas-event-primary and schemas-event-secondary. One MR here

There is no explicit way of skipping this CI step in case we want to delete schemas at some point. I'm not sure if we can simply merge an MR even if this check fails, knowing that we want to do it, or if we should create a way of skipping the CI, like if the MR title includes a text or something like that? I'm not sure if we have similar cases in other repositories.

Any thoughts @xcollazo, @tchin?

Ditto what @xcollazo said above. In order to have the desired behavior for this pipeline job, I think you need:

  1. In .gitlab-ci.yaml:
when: manual
allow_failure: false
  1. In your shell script, you need to check for the existence of environment variable CI_BREAK_GLASS_REASON (or whatever you name it, should be documented in README.md) just like airflow-dags does it in the Python script:
if circumvent_reason := os.getenv("CI_BREAK_GLASS_REASON"):
    print(f"Circumventing approval check.\nReason: {circumvent_reason}")
    sys.exit(0)

Thanks both! I've added the changes. Now it requires a manual execution and it can be bypassed by adding the CI_BREAK_GLASS_REASON.

I tried to make it run automatically unless it is manually forced, but it looks like GitLab only allow this kind of behavior on jobs that require a manual trigger, you cannot run it by default, and allow a retry with arguments.

Now it should behave like the Airflow job.

The other MR using this component doesn't need changes, it will pass the CI once the component is merged.

Nice, great solution! Thank you Javier!