Over in T88459, and in a few recent meetings, we've flushed out a sketch for to get standardized messages into Kafka for later consumption. We've coalesced on a way to move forward, and a MVP. This task will track the creation of the EventBus MVP.
Initial use cases
- Provide edit related events (ex: edit, creation, deletion, revision deletion, rename). Initially, these events will be consumed by RESTBase / a change propagation service (T102476, T111819), as well as analytics / research. Potential uses include a purge service, RCStream, and push notifications.
- EventLogging: Decode, validate and enqueue JSON events for EL.
See also: T84923.
Architecture Decisions
- We will standardize on JSON Schema as our canonical schema spec, but do so in such a way that Avro can be used in Analytics type systems. Equivalent Avro Schemas may be generated as part of CI.
- For MVP, JSON data will be produced to Kafka. We consider Avro Binary later.
- There will be a Kafka Topic -> Schema mapping, and only that schema can be produced to a topic.
MVP Description
The MVP will consist of:
- REST Service that validates JSON data against a schema and produces to Kafka.
- Schema Repository Layout and Topic -> Schema mapping config that Service loads on startup.
- A TBD implemented use case of this system.
Things we could consider after the MVP:
- Schema review and CI processes:
- schema evolution rules
- Auto Avro schema generation
- Auto Avro java class generation
- Schema metadata conventions (fields common to all schemas?)
- Schema listing and discussion UI
- Integrate with on-wiki schema storage for EventLogging?
- Mediwiki Extension?
Other ideas
- Schema lookup service