Page MenuHomePhabricator

Convert mjolnir from KafkaRDD to direct kafka-python usage
Closed, ResolvedPublic

Description

KafkaRDD in pyspark only supports the 0.8 api. This causes the kafka servers to convert messages on the fly from the storage format into the format used by 0.8, and triggered OOM's in the kafka-jumbo cluster. The rest of mjolnir uses the kafka-python package to talk to the kafka clusters, convert usage of KafkaRDD to use this same package. This gives us more direct control over the way we talk to kafka, and ensures we can upgrade dependencies as needed.

Event Timeline

Change 500859 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Convert KafkaRDD usage to kafka-python

https://gerrit.wikimedia.org/r/500859

Change 500859 merged by jenkins-bot:
[search/MjoLniR@master] Convert KafkaRDD usage to kafka-python

https://gerrit.wikimedia.org/r/500859