This task will be the parent for work to build a schema repository component for the Modern Event Platform program. The working name of this component will be Event Schema Repositories.
In T198256: RFC: Modern Event Platform - Choose Schema Tech, it was decided to continue using JSONSchema as we do now. As such, we need to either write new schema registry software, or find one to adapt to our needs. We've collected a lot of requirements and wishes from analysts, engineers and product managers for this component. I'll summarize those as user stories here. We can then discuss how to satisfy those stories in a particular implementation and design.
- As an engineer, I want to develop new code that uses schemas without committing changes to the production schema registry so that I don't endanger production during development.
- As an engineer, I want a queryable (read only) service API so that I can discover schemas
- As an engineer, I want each schema/(schema revision) to have a unique ID in a form of a publically accessible URI
- As a data analyst or product manager, I want a canonical place where I can easily draft schema definitions and implementation details in collaboration with product engineers during implementation (example), document and access them once a schema is live, and correct and amend them later as needed.
- As an engineer, I want strict and clear schema policies enforced so that I don't create event data that is difficult for consumer integration.
- As an engineer, I want enforcement of schema changes to be backwards compatible so that I don't break downstream consumers of events.
- As an analyst, I want clear analytics schema guidelines and conventions for schema design so that schemas are more consistent, maintainable and easier to collaborate on.
- As an analyst/engineer, I want clear analytics schema guidelines and conventions so that integration into analytics datastores and dashboards is easy.
- As an engineer, I want to be able to share schemas in development so that others can run and test my code.
- As an engineer, I want other critical Event Platform components to function if the Schema service is offline (via cached schemas, local copies, etc.) so that event systems are reliable and highly available.
- As an engineer, I want to be able to reuse and reference schemas from one another using the aforementioned URI ID in order to avoid copy-pasting.
- As an analyst/product manager I want to able to search through existing schemas to find which data is being collected and how the data is defined in the event system.