KafkaRDD in pyspark only supports the 0.8 api. This causes the kafka servers to convert messages on the fly from the storage format into the format used by 0.8, and triggered OOM's in the kafka-jumbo cluster. The rest of mjolnir uses the kafka-python package to talk to the kafka clusters, convert usage of KafkaRDD to use this same package. This gives us more direct control over the way we talk to kafka, and ensures we can upgrade dependencies as needed.
Description
Description
Details
Details
Related Changes in Gerrit:
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Convert KafkaRDD usage to kafka-python | search/MjoLniR | master | +85 -42 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | elukey | T219842 [Post-mortem] Kafka Jumbo cluster cannot accept connections | |||
| Resolved | EBernhardson | T219932 Convert mjolnir from KafkaRDD to direct kafka-python usage |
Event Timeline
Comment Actions
Change 500859 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Convert KafkaRDD usage to kafka-python
Comment Actions
Change 500859 merged by jenkins-bot:
[search/MjoLniR@master] Convert KafkaRDD usage to kafka-python