How should we deal with major version changes in Event Platform streams?
Can we do what is usually done for API versioning?
== Problem
In Event Platform, we version event schemas, and require that a stream only contain events of the same event schema lineage. We validate that event schema versions are backwards compatible, but only within a major schema version. It is technically 'allowed' (by [[ https://github.com/wikimedia/jsonschema-tools#compatibility | jsonschema-tools test suite ]]) to make new major versions of event schemas that are incompatible with older major versions.
However, if this is done, and incompatible versions of events are produced to the same stream, all downstream consumers must adapt and go through a manual migration process.
Example:
Schema 1.0.0:
```lang=yaml
user:
type: string
```
Schema 2.0.0
```lang=yaml
user:
type: object
properties:
name:
type: string
email:
type: string
```
Automatically ingesting a stream that contains events of both schema into analytics cluster Hive (we do this for all streams) is not possible. A migration would look something like:
1. stop producing events of schema version 1.
2. delete (or rename) corresponding Hive table.
3. start producing events of schema version 2.
This series of migration steps would be necessary for all downstream consumers. The more consumers there are, the harder this will be.
This problem is not new. This is the same problem faced by any API. An API must be versioned in order to make breaking changes without immediately breaking all users of the API.
Semantic versioning dictates that incompatible API changes can only be made over major version changes. HTTP APIs often use version specific URIs (`/api/v1/user`), query parameters (`/api/user?version=1`), or `accept-version` (`Accept-version: v1`) headers to accomplish this. Programming library APIs that use major versioning to accomplish the same thing.
We'd like streams to be a stable and reliable way to transfer application state between different systems. We should consider versioning for Event Platform streams, and possibly any 'data product', just as we would any API.
== Potential solutions
=== S1. Do nothing
**Pros**:
- no active work to do or conventions to adopt
**Cons**:
- Incompatible schema changes are only possible by making new streams.
- New stream names must be created (and bikeshed) whenever an incompatible change is needed. E.g. If we ever need to make a new `mediawiki.page_change`, we'll have to think of something like `mediawiki.page_change_new`
=== S2. Versioning convention
Adopt a naming convention for versioning streams. Using this naming convention would be optional. Ideas:
- Prefix major version: `v1.mediawiki.page_change`
- Suffix major version: `mediawiki.page_change.v1`
(Or some other variation.)
Upgrading major versions is effectively the same as creating a new stream, except that the useful names are not squatted and the choice of new stream name is obvious.
**Pros**:
- No work to do other than deciding on and documenting convention.
- Incompatible changes are possible via documented convention and process
- Opt in: existent streams would continue to work.
-- Streams that won't ever have major changes or name squatting problems don't need to do this, or aren't expected to have many consumers.
- Allows for gradual upgrade period. An old version of a stream can be deprecated before it is decommissioned.
**Cons**:
- Not all streams would use versioning convention.
- Major versioning is not abstracted away from users.
-- Owners 'producers' of stream have to manually manage major versions by following naming convention.
-- Consumers would have to
=== S3. Automatic versioning
There are many variations on this solution but all of them are complex.
We could automatically produce events of different major versions to different topics, even if they are nominally in the same stream. E.g. `mediawiki.page_change` is the stream name, but producer libraries (EventGate, wikimedia-event-utilities, etc.) would examine each event before producing it to Kafka, and choose the topic name based on the event's major version. So a v1 event might go to the Kafka topic `eqiad.v1.mediawiki.page_change` and v2 event to `eqiad.v2.mediawiki.page_change`.
Consumers would have to be aware of the possibilities of multiple topics versions, and choose what they want. They could choose to only consume from one version or the other, or they could choose to consume all versions.
**Pros**:
- Producers are not aware of stream versioning
**Cons**:
- Consumers are very aware of stream versioning. This might be tractable if we had a unified language and framework at WMF and could mandate use of a consumer library, and/or if we didn't [[ https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams | make streams public ]].
- Ingestion code is very aware of stream versioning. Each ingestion job would have to make a choice for what to do. Should they ingest into versioned tables?
== See also
- {T120242}
- {T331399}
- [[ https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#12-versioning | Microsoft API versioning guidelines ]]
- [[ https://www.infoq.com/articles/roy-fielding-on-versioning/ | Roy Fielding on API versioning ]]