Page MenuHomePhabricator

Major (API) versioning of Event Platform streams
Closed, ResolvedPublic

Description

How should we deal with major version changes in Event Platform streams?
Can we do what is usually done for API versioning?

Problem

In Event Platform, we version event schemas, and require that a stream only contain events of the same event schema lineage. We validate that event schema versions are backwards compatible, but only within a major schema version. It is technically 'allowed' (by jsonschema-tools test suite) to make new major versions of event schemas that are incompatible with older major versions.

However, if this is done, and incompatible versions of events are produced to the same stream, all downstream consumers must adapt and go through a manual migration process.

Example:

Schema 1.0.0:

user:
  type: string

Schema 2.0.0

user:
  type: object
  properties:
    name:
      type: string
    email:
      type: string

Automatically ingesting a stream that contains events of both schema into analytics cluster Hive (we do this for all streams) is not possible. A migration would look something like:

  1. stop producing events of schema version 1.
  2. delete (or rename) corresponding Hive table.
  3. start producing events of schema version 2.

This series of migration steps would be necessary for all downstream consumers. The more consumers there are, the harder this will be.

This problem is not new. This is the same problem faced by any API. An API must be versioned in order to make breaking changes without immediately breaking all users of the API.

Semantic versioning dictates that incompatible API changes can only be made over major version changes. HTTP APIs often use version specific URIs (/api/v1/user), query parameters (/api/user?version=1), or accept-version (Accept-version: v1) headers to accomplish this. Programming library APIs use major versioning to accomplish the same thing.

We'd like streams to be a stable and reliable way to transfer application state between different systems. We should consider versioning for Event Platform streams, and possibly any 'data product', just as we would any API.

Potential solutions

S1. Do nothing

Pros:

  • no active work to do or conventions to adopt

Cons:

  • Incompatible schema changes are only possible by making new streams.
  • New stream names must be created (and bikeshed) whenever an incompatible change is needed. E.g. If we ever need to make a new mediawiki.page_change, we'll have to think of something like mediawiki.page_change_new

S2. Versioning convention

Adopt a naming convention for versioning streams. Using this naming convention would be optional. Ideas:

  • Prefix major version: v1.mediawiki.page_change
  • Suffix major version: mediawiki.page_change.v1

(Or some other variation.)

Upgrading major versions is effectively the same as creating a new stream, except that the useful names are not squatted and the choice of new stream name is obvious.

Pros:

  • No work to do other than deciding on and documenting convention.
  • Incompatible changes are possible via documented convention and process
  • Opt in: existent streams would continue to work.
    • Streams that won't ever have major changes or name squatting problems don't need to do this, or aren't expected to have many consumers.
  • Allows for gradual upgrade period. An old version of a stream can be deprecated before it is decommissioned.

Cons:

  • Not all streams would use versioning convention.
  • Major versioning is not abstracted away from users.
    • Owners 'producers' of stream have to manually manage major versions by following naming convention.
    • Consumers would still have to 'upgrade' to using the new version, but the old version could be deprecated and maintained for a period of time to allow for a rolling / graceful upgrade.

S3. Automatic versioning

There are many variations on this solution but all of them are complex.

We could automatically produce events of different major versions to different topics, even if they are nominally in the same stream. E.g. mediawiki.page_change is the stream name, but producer libraries (EventGate, wikimedia-event-utilities, etc.) would examine each event before producing it to Kafka, and choose the topic name based on the event's major version. So a v1 event might go to the Kafka topic eqiad.v1.mediawiki.page_change and v2 event to eqiad.v2.mediawiki.page_change.

Consumers would have to be aware of the possibilities of multiple topics versions, and choose what they want. They could choose to only consume from one version or the other, or they could choose to consume all versions.

Pros:

  • Producers are not aware of stream versioning

Cons:

  • Consumers are very aware of stream versioning. This might be tractable if we had a unified language and framework at WMF and could mandate use of a consumer library, and/or if we didn't make streams public.
  • Ingestion code is very aware of stream versioning. Each ingestion job would have to make a choice for what to do. Should they ingest into versioned tables?

See also

Event Timeline

Ottomata added a subscriber: phuedx.
Ottomata added a subscriber: VirginiaPoundstone.

hey @Ottomata could you complete the second con for S2 "Consumers would have to ..." ? I think the statement is incomplete

Cons:

  • Not all streams would use versioning convention.
  • Major versioning is not abstracted away from users.
    • Owners 'producers' of stream have to manually manage major versions by following naming convention.
    • Consumers would have to

complete the second con

Oops, done ty!

FWIW, API platform folk are talking about this for API guidelines now. Whatever is decided, we should align on the same policies for major versioning, even if the mechanisms are different.

FWIW, API platform folk are talking about this for API guidelines now. Whatever is decided, we should align on the same policies for major versioning, even if the mechanisms are different.

For some value of "talking about", yes. We were just discussing which of our many possible subject areas to focus on next, and versioning was one of the possibilities. Agreed on aligning, thanks for brining that up. Given that your team is also thinking about versioning, we should take advantage of that timing.

@BPirkle great! How can we best collaborate and make this happen soonish? We have a new stream we'd like to use versioning for now, so the sooner the better for us.

Hello! At an Event Platform meeting today, we decided that we prefer S2. Versioning convention, with the convention being to suffix stream names with a major version, e.g. mediawiki.page_change.v1.

Suffixing has advantages over prefixing:

If there are no objections to this by May 1 2023, we will proceed with this convention for mediawiki.page_change.v1, and document the convention on wikitech.

cc @prabhat @BPirkle @mpopov, @Milimetric, @Krinkle maybe?

Hi,
Just wanted to be sure, at the moment, we are only talking about a naming strategy for the event streams that will reflect the schema being used for events in that event stream.
The naming convention seems good to me.

A few questions:

  1. Does this mean that the existing streams will be renamed as follows? /v2/stream/mediawiki.page-create -> /v2/stream/mediawiki.page-create.v1 /v2/stream/mediawiki.revision-create -> /v2/stream/mediawiki.revision-create.v1
  1. Should we anticipate schema changes coming soon for the eventstreams?

Does this mean that the existing streams will be renamed as follows?

We will not rename any existing streams. This will be opt in. We are currently working on a new data model that better lets us represent state changes to entities. For the new streams we are working on, we will apply this versioning convention. So, the new mediawiki.page_change stream, would be mediawiki.page_change.v1. If/when we publish that via stream.wikimedia.org, the endpoint would be /v2/stream/mediawiki.page-change.v1

Should we anticipate schema changes coming soon for the eventstreams?

We will not immediately deprecate the older streams (e.g. mediawiki.revision-create, etc.), but hope to do so once we feel good about (and establish SLOs for) the new page_change streams. Once we deprecate, there will be a long deprecation period. I wouldn't be surprised if we maintained these streams for another year.

I have couple of questions regarding this topic:

  • when an event occurs - are we going to generate events in all versions to satisfy all consumers, also the legacy one?
  • do we have a plan for deprecating a version? I see it difficult for us to maintain multiple versions of events as we're not a big team. Maybe we could bind ourselves to support like three (dev|stable|legacy) or two (stable|legacy) versions. And allowing versioning is opening a gate to cases when we work on v7 but something is still using v1.
  • if for any reason the service consumes both events (of v1 and v2), is there any way for the service to detect if those are the same event but served as different schema?

I'm also thinking out loud - in S1 you mentioned that without rules we might have to come up with a new event, like mediawiki.page_change_new. I'm wondering how often we upgrade event schemas and what is the reason of such change. Is it supporting new clients? Proving more details? Coming up with a new use case? I'm trying to think about how to separate situations where we decide what should be the next version and what a new event.
`

when an event occurs - are we going to generate events in all versions to satisfy all consumers, also the legacy one?

These would be totally distinct streams, so yes, if the producer wants to go through a deprecation period where both stream versions are available, they will have to produce events to both streams. This would be similar to how an HTTP API endpoint owner would have to serve requests from both endpoint versions during the deprecation period.

do we have a plan for deprecating a version? I see it difficult for us to maintain multiple versions of events as we're not a big team.

No specific plan, but this would probably be producer/team/stream specific. We'll have this problem now with our mediawiki state streams, as page_change supersedes a bunch of older streams (revision-create, page-create, page-delete, etc.). We'd like to make a deprecation plan for these once we feel good about the deployment of page_change and a few other related streams.

if for any reason the service consumes both events (of v1 and v2), is there any way for the service to detect if those are the same event but served as different schema?

Not automatically no. This would be data domain specific. For mediawiki page edits, the rev_id and rev_sha1 indicate this. For more nuanced page changes (supressions, undeletes) this would be more difficult.

I'm wondering how often we upgrade event schemas and what is the reason of such change.

Major upgrades are and should be very rare. The reasons for doing so are going to also be domain specific. The case for our current mediawiki state change project is detailed in T308017: Design Schema for page state and page state with content (enriched) streams.

Ottomata triaged this task as Medium priority.
Ottomata edited projects, added Event-Platform (Sprint 12); removed Event-Platform.

Alright, no objections and May 1 has come and gone.

I've documented this here:
https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning

I will be removing the rc1 prefix of mediawiki.page_change, and releasing it as mediawiki.page_change.v1.

Change 920377 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] Change default page_change stream name to use major versioning

https://gerrit.wikimedia.org/r/920377