Why
- Reduce the surface of Wikimedia-specific code we need to maintain.
- Horizontal scalability.
- Excellent integration with Kafka, which we are already committed to.
- Schema evolution capabilities; backward/forward compatibility.
- Tight integration with Hadoop ecosystem.
- Efficient binary serialization.
How
- Articulate the value of this migration and get buy-in from stakeholders. (See 'Why', above.)
- Package and Puppetize the Confluent schema registry.
- Package and Puppetize Kafka REST Proxy.
- Upgrade Kafka to a version compatible with Confluent Platform.
- Write a MediaWiki extension that provides an interface for creating, editing, and browsing schema, and which uses the schema registry as a storage backend.
- Set all of the above up in labs.
- Fully implement a particular proof-of-concept, so that people can actually see how this works.
- Design a public event-logging endpoint and solicit a security review.
- Migrate existing schema to Avro? (Maybe not.)