How should we deal with major version changes in Event Platform streams?
Can we do what is usually done for API versioning?
Problem
In Event Platform, we version event schemas, and require that a stream only contain events of the same event schema lineage. We validate that event schema versions are backwards compatible, but only within a major schema version. It is technically 'allowed' (by jsonschema-tools test suite) to make new major versions of event schemas that are incompatible with older major versions.
However, if this is done, and incompatible versions of events are produced to the same stream, all downstream consumers must adapt and go through a manual migration process.
Example:
Schema 1.0.0:
user: type: string
Schema 2.0.0
user: type: object properties: name: type: string email: type: string
Automatically ingesting a stream that contains events of both schema into analytics cluster Hive (we do this for all streams) is not possible. A migration would look something like:
- stop producing events of schema version 1.
- delete (or rename) corresponding Hive table.
- start producing events of schema version 2.
This series of migration steps would be necessary for all downstream consumers. The more consumers there are, the harder this will be.
This problem is not new. This is the same problem faced by any API. An API must be versioned in order to make breaking changes without immediately breaking all users of the API.
Semantic versioning dictates that incompatible API changes can only be made over major version changes. HTTP APIs often use version specific URIs (/api/v1/user), query parameters (/api/user?version=1), or accept-version (Accept-version: v1) headers to accomplish this. Programming library APIs use major versioning to accomplish the same thing.
We'd like streams to be a stable and reliable way to transfer application state between different systems. We should consider versioning for Event Platform streams, and possibly any 'data product', just as we would any API.
Potential solutions
S1. Do nothing
Pros:
- no active work to do or conventions to adopt
Cons:
- Incompatible schema changes are only possible by making new streams.
- New stream names must be created (and bikeshed) whenever an incompatible change is needed. E.g. If we ever need to make a new mediawiki.page_change, we'll have to think of something like mediawiki.page_change_new
S2. Versioning convention
Adopt a naming convention for versioning streams. Using this naming convention would be optional. Ideas:
- Prefix major version: v1.mediawiki.page_change
- Suffix major version: mediawiki.page_change.v1
(Or some other variation.)
Upgrading major versions is effectively the same as creating a new stream, except that the useful names are not squatted and the choice of new stream name is obvious.
Pros:
- No work to do other than deciding on and documenting convention.
- Incompatible changes are possible via documented convention and process
- Opt in: existent streams would continue to work.
- Streams that won't ever have major changes or name squatting problems don't need to do this, or aren't expected to have many consumers.
- Allows for gradual upgrade period. An old version of a stream can be deprecated before it is decommissioned.
Cons:
- Not all streams would use versioning convention.
- Major versioning is not abstracted away from users.
- Owners 'producers' of stream have to manually manage major versions by following naming convention.
- Consumers would still have to 'upgrade' to using the new version, but the old version could be deprecated and maintained for a period of time to allow for a rolling / graceful upgrade.
S3. Automatic versioning
There are many variations on this solution but all of them are complex.
We could automatically produce events of different major versions to different topics, even if they are nominally in the same stream. E.g. mediawiki.page_change is the stream name, but producer libraries (EventGate, wikimedia-event-utilities, etc.) would examine each event before producing it to Kafka, and choose the topic name based on the event's major version. So a v1 event might go to the Kafka topic eqiad.v1.mediawiki.page_change and v2 event to eqiad.v2.mediawiki.page_change.
Consumers would have to be aware of the possibilities of multiple topics versions, and choose what they want. They could choose to only consume from one version or the other, or they could choose to consume all versions.
Pros:
- Producers are not aware of stream versioning
Cons:
- Consumers are very aware of stream versioning. This might be tractable if we had a unified language and framework at WMF and could mandate use of a consumer library, and/or if we didn't make streams public.
- Ingestion code is very aware of stream versioning. Each ingestion job would have to make a choice for what to do. Should they ingest into versioned tables?
See also
- T120242: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth
- T331399: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page
- Data Lifecycle Management Process (draft)
- Microsoft API versioning guidelines
- Roy Fielding on API versioning