We propose building a scalable and unified system for passing schema-ed messages between various applications around WMF. This ticket will track work to build such a system.
Relevant tickets include:
- T84923 - Reliable publish / subscribe event bus
- T88459 - Implementation sketch for reliable event bus using Kafka
- T106256 - Kafka Client for MediaWiki
- T102082 - Replace EventLogging with Confluent Platform
Two meetings were held in August 2015 to collect use cases for and to discuss possible implementations of this new system. Notes from these meetings are in an etherpad here:
Conclusion from our architecture brainstorming meeting are as follows:
- We will run a rest proxy for producing messages. This may be the Confluent Kafka REST Proxy, or it may not.
- Our rest proxy will validate messages and return error if production fails.
- We will run 2 Kafka clusters: The current Analytics cluster, as well as a 'Production' cluster. Production cluster is assumed to be much lower volume than Analytics cluster.
- MVP will not consider the authentication problem.
We hope this will be solved upstream by a future Kafka version: https://issues.apache.org/jira/browse/KAFKA-1682
- We would like to support both JSONschema and Avro.
- EventLogging's use case (client side events) will not directly be allowed. Instead, we will investigate writing a process to read varnish shared logs that produces events through our rest proxy.
Confluent's Kafka REST Proxy and Schema Registry already exist and support many of our use cases for production of events, but there are some limitations:
- No JSONschema support
- No 1 to 1 topic <-> schema mapping. Clients shouldn't be able to produce different schemas (outside of normal schema versioning) to the same topic.
There are also more serious limitations on the consumer side, but consuming from Kafka via a REST API is a little awkward for many real world clients. Because of this, the MVP Event Bus will not support consumption. It will be up to users to implement consumption on their own via the Kafka client of their choice.
Our first task will be to investigate the difficulty of augmenting Confluent to fix these limitations to production of events.