Page MenuHomePhabricator

Import Kafka messages into HDFS authenticating with TLS/SSL
Closed, DuplicatePublic13 Estimated Story Points

Description

We use Camus to batch import messages from Kafka into HDFS. Camus uses the 0.8.0 old SimpleConsumer to consume from Kafka. However "SSL is supported only for the new Kafka Producer and Consumer, the older API is not supported".

So HM! Options:

  1. Update Camus to use newer Consumer API with SSL support.
  2. Import into HDFS with something other than Camus.

We've done some small spikes for 2 before, examining both Kafka Connect and Gobblin. Neither did quite what we need at the time, but perhaps we should take a minute to reexamine these options, before attempting to patch up Camus.

Event Timeline

Ottomata created this task.Jun 1 2017, 8:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 1 2017, 8:14 PM

Until we solve this problem we won't be able to disable PLAINTEXT Kafka use. That means we won't solve the current problem that anyone in prod can consume webrequest data from Kafka.

In the meantime, we could restrict the use of port 9092 (PLAINTEXT, no authentication) to only certain nodes (Hadoop, maybe stat100[24], etc.)

This would allow us to defer working on this task until we figure out a better replacement for Camus, without defeating the purpose of enforcing authentication.

Nuria added a comment.Jun 5 2017, 3:54 PM

Tasks:
Evaluate whether camus can use newer consumer, if so, proceed this way.

Possible issues:

  • Does newer consumer APi manage offsets for us?

How to test:

  • We have changes in camus package. We would need to update camus code to run with new kafka library. We can test consuming from kafka and dumping into a different place on HDFS. This can be done before turning in TLS support as that is native to that consumer API mode.
Nuria raised the priority of this task from Medium to High.Jun 5 2017, 3:54 PM
Nuria set the point value for this task to 13.
Nuria moved this task from Incoming to Dashiki on the Analytics board.Jun 5 2017, 3:57 PM
Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.Jul 10 2017, 3:59 PM
fdans moved this task from Backlog (Later) to Wikistats on the Analytics board.Oct 23 2017, 3:47 PM
fdans lowered the priority of this task from High to Lowest.Mar 29 2018, 5:28 PM
fdans moved this task from Wikistats to Operational Excellence on the Analytics board.
Milimetric raised the priority of this task from Lowest to Low.Oct 22 2018, 3:38 PM
Aklapper removed a project: Analytics.Jul 4 2020, 7:59 AM