Maniphest T102082

Replace EventLogging with Confluent Platform
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	ori
	Jun 10 2015, 11:19 PM

Description

Why

Reduce the surface of Wikimedia-specific code we need to maintain.
Horizontal scalability.
Excellent integration with Kafka, which we are already committed to.
Schema evolution capabilities; backward/forward compatibility.
Tight integration with Hadoop ecosystem.
Efficient binary serialization.

How

Articulate the value of this migration and get buy-in from stakeholders. (See 'Why', above.)
Package and Puppetize the Confluent schema registry.
Package and Puppetize Kafka REST Proxy.
Upgrade Kafka to a version compatible with Confluent Platform.
Write a MediaWiki extension that provides an interface for creating, editing, and browsing schema, and which uses the schema registry as a storage backend.
Set all of the above up in labs.
- Fully implement a particular proof-of-concept, so that people can actually see how this works.
Design a public event-logging endpoint and solicit a security review.
Migrate existing schema to Avro? (Maybe not.)

Related Objects

Mentioned In: Event-Platform
T110748: Event Bus
T103505: Create analytics-centric Cirrus logs and have them import into HDFS
T84923: Reliable publish / subscribe event bus

Event Timeline

ori created this task.Jun 10 2015, 11:19 PM

ori assigned this task to Ottomata.

ori raised the priority of this task from to Medium.

ori updated the task description. (Show Details)

ori added a project: MediaWiki-extensions-EventLogging.

ori added subscribers: ori, Ottomata.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 10 2015, 11:19 PM

Ottomata set Security to None.Jun 11 2015, 12:39 AM

Ottomata added subscribers: • kevinator, • ggellerman, Milimetric, JAllemandou.

Confluent experiments:

https://github.com/ottomata/EventAvro/

See also labs host confluent01.analytics.eqiad.wmflabs

such-data

While I like the idea, I think the overhead of moving from json to Avro is really not to be neglected.

If this works as it should, users will be able to produce their data in JSON format.

Design a public event-logging endpoint and solicit a security review.

This part will be interesting. I think production of messages can be limited to the same annoyances as eventlogging pretty easily, but simply forcing clients to submit valid schemas. That is slightly better than eventlogging is now, as currently anyone can submit anything into the raw client side 0mq stream.

However, the Kafka REST Proxy does have a consumer interface, and we certainly want to restrict that to internal use only. I suppose we can do this with HTTP URI level restrictions (in varnish, nginx, apache, loadbalancer, whatever), but I'm not sure if that is the best way to do this. Its also all open source, so we could patch or fork however we like: https://github.com/confluentinc/kafka-rest

Ottomata mentioned this in T84923: Reliable publish / subscribe event bus.Jun 22 2015, 1:42 PM

https://github.com/confluentinc/kafka-rest/issues/79

Iiiinteresting!

Ottomata mentioned this in T103505: Create analytics-centric Cirrus logs and have them import into HDFS.Jun 25 2015, 2:22 PM

Perhaps a useful endpoint?

https://github.com/linkedin/pinot

Ottomata mentioned this in T110748: Event Bus.Aug 28 2015, 10:39 PM

Ottomata mentioned this in Event-Platform.Aug 31 2015, 8:24 PM

EventBus was done instead.