Page MenuHomePhabricator

Establish if Camus can support TLS encryption + Authentication to Kafka with a minimal code change
Closed, DeclinedPublic

Description

We should invest some time reviewing Camus' code and see if its Kafka client could be migrated to a newer one, with support to encryption and authentication.

Useful docs: https://gobblin.readthedocs.io/en/latest/developer-guide/HighLevelConsumer/

Event Timeline

Today I did a quick test with the following code on stat1005, that worked nicely:

package org.wikimedia.kafkatest;

import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import java.util.Arrays;
import java.util.Properties;

public class App
{
    public static void main( String[] args )
    {
        Properties props = new Properties();
        props.put("bootstrap.servers", "kafka-jumbo1001.eqiad.wmnet:9093");
        props.put("group.id", "consumer-test-elukey");
        props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        props.put("security.protocol", "SSL");
        props.put("ssl.truststore.location", "/home/elukey/truststore.jks");
        props.put("ssl.truststore.password", "XXXXXX");
        props.put("ssl.enabled.protocols", "TLSv1.2");
        props.put("ssl.protocol", "TLS");
        props.put("ssl.cipher.suites", "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384");
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("eventlogging_NavigationTiming"));
        int n = 0;
        try {
           while(n < 10) {
               ConsumerRecords<String, String> records = consumer.poll(10000);
               for (ConsumerRecord<String, String> record : records) {
                   System.out.println(record.offset() + ": " + record.value());
               }
               n++;
           }
        } finally {
            consumer.close();
        }
    }
}

In my pom:

<dependency>
  <groupId>org.apache.kafka</groupId>
  <artifactId>kafka-clients</artifactId>
  <version>0.11.0.3</version>
</dependency>

The truststore.jks was copied manually from kafka-jumbo to stat1005. It contains only public certs, so there shouldn't really be the need for a password:

elukey@stat1005:~$ keytool -list -v -keystore truststore.jks
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Enter keystore password:
Keystore type: jks
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: puppetmaster1001.eqiad.wmnet_8140
[...]

In theory this trustore should be distributed to all Hadoop worker nodes for Camus testing, plus also to stat100x client nodes to allow people to use TLS with Kafka where possible. Is it feasible to manage a password in this case? Maybe we could use something super easy and make it publicly available, or just no password (if cergen allows it, kafka should support it).

@Ottomata thoughts?

Hm, IIRC Java keytool will now allow generation of .jks files without a password. If no password is used, cergen will not generate .jks files.

Since the truststore only contains the public puppet CA cert, we could create a cergen entry with a dummy password that only generated the puppet ca cert and truststore.jks file. Then we could include and distribute that password anywhere as needed as it wouldn't be protecting any keys.

Or if we could find some other way to generate passwordless .jks files, that would work too. I wouldn't mind managing a puppet CA only truststore.jks outside of cergen.

Oh boy. Just realized that Camus uses the Old Kafka consumer interface which does not support TLS. I dunno if this will be so easy after all.

Yeah, I take it all back. I don't think this task is so easy after all. This isn't just a client upgrade, it is a library change. kafka-clients is a totally different Kafka library than the old kafka_2.10 which Camus currently uses.

https://stackoverflow.com/questions/51453607/whats-the-difference-between-the-two-maven-modules-in-kafka

@Ottomata I agree, the task is not super easy :)

I do think that we should keep going though, even if it will take 2/3 days of dev time to migrate everything over. If it takes more than 2/3 days and Joseph thinks that it is not worth it we could start checking again Gobblin.

I think it might take more than 2/3 days.

I'd be willing to try Gobblin and/or forked Kafka Connect HDFS for some things at this point!

@Ottomata I didn't get why it is super difficult to move to form the old consumer config to the newer one, is it a problem for the actual camus configs or something else? (trying to understand)

I think that we should decide what path to follow, something like:

  1. Joseph dedicating a day to assess if Camus could be ported, estimating more or less how much time if would take.
  2. If not feasible, we try Gobblin.
  3. If not feasible, we fork Kafka Connect.

I would like to fork Kafka Connect, if possible, only as last step and rely on maintained (not by us) Apache projects as much as possible. My 2c :)

I only looked briefly yesterday but from what I gathered: Both old Kafka Consumer and Producer are used. As well as old Message API and Kafka Message SerDe interface (I think). The old consumer is using the SimpleConsumer, which does not store offsets in Kafka. I'm not even sure if the kafka-clients library has support for 'SimpleConsumser' like operation. It might be using the new 'high level' consumer but without auto offset commit.

Anyway, I'm sure doing this is possible, but I had originally thought that it would be just a version upgrade. This is more like switching from a 5 year old pykafka to some librdkafka based library. It is a completely different client. Maybe Joseph will think differently if he takes a look! If he reads the code and says "ah ha, no problem here we go", then let's go for it and upgrade Camus. Otherwise, I'd say we should spend the time starting to replace it.

As for Gobblin vs Kafka Connect, you are probably right. However we should do a quick spike to compare to be sure. There may be good features that Gobblin doesn't have that Connect does (or vice versa who knows!). I do like that Kafka Connect itself IS part of Apache Kafka...it's just the Confluent Connector interfaces (which are very useful!) that have a bad license.

This seems not the road to follow, declining the task. We can re-open if we feel it is needed.