Page MenuHomePhabricator

{stag} EventLogging on Kafka
Closed, ResolvedPublic

Description

Objective: Move the EventLogging pipeline to use Kafka
Key Result: 10x capacity increate for EventLogging

See Lightning Talk presented October 27th: https://www.youtube.com/watch?v=yUQ5d192z3M
slides are here: https://www.mediawiki.org/wiki/File:EventLogging_on_Kafka_-_Lightning_Talk.pdf

Related Objects

StatusSubtypeAssignedTask
Resolved kevinator
ResolvedOttomata
ResolvedOttomata
Resolvedakosiaris
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
DeclinedOttomata
ResolvedOttomata
ResolvedOttomata
Resolved madhuvishy
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
Resolved madhuvishy
ResolvedOttomata
Declined madhuvishy
ResolvedOttomata

Event Timeline

kevinator raised the priority of this task from to Needs Triage.
kevinator updated the task description. (Show Details)
kevinator subscribed.
kevinator moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.
kevinator set Security to None.

Dan and I just pushed Analytics schema events through this system. During our largest test, we posted around 9000 valid events per second through by using ab to hammer bits.wikimedia.org using 24 cores on 12 different nodes to post.

ab -n 100000 -c 24  'https://bits.wikimedia.org/beacon/event?%7B%22event%22%3A%7B%22name%22%3A%22blah%22%2C%22age%22%3A33%7D%2C%22revision%22%3A13317883%2C%22schema%22%3A%22Analytics%22%2C%22webHost%22%3A%22meta.wikimedia.org%22%2C%22wiki%22%3A%22metawiki%22%7D;'

EventLogging on Kafka with 12 parallelized client side processors seemed to scale fairly linearly. The charts below are a little inaccurate, since they show a 1 minute average, and our test lasted less than 2 minutes. The Analytics schema is a simple use case, and more complicated schemas may take more power to validate, but this test shows overall that we can now scale EventLogging linearly.

Also, since Kafka buffers the raw events, if we do push more events through the system than our current # of processors can process, the events will just remained buffered, but not dropped. A temporary spike of data larger than our bandwidth will only delay processing of events.

Screen Shot 2015-09-30 at 16.07.37.png (774×2 px, 234 KB)

Screen Shot 2015-09-30 at 16.07.58.png (840×1 px, 365 KB)

kevinator claimed this task.

This project & 2015-16 Fiscal Q1 goal is DONE as of Sept 30 2015!
Goals page updated: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals

No more {stag} tasks should be created.