Page MenuHomePhabricator

Discovery for Kafka cluster brokers
Open, Stalled, NormalPublic

Description

Kafka broker lists are currently maintained in puppet, which is not accessible in Helm deployment charts. We could hardcode these there, but it would be nice to have a discovery service for the different Kafka clusters.

The discovery url would only be used for bootstrapping Kafka clients. Kafka clients themselves get the list of brokers to use from Kafka during bootstrap. Having a discovery url would allow us to not have to hardcode broker hostnames in Helm Values.yaml. Instead we could do:

kafka:
  conf:
    metadata.broker.list: kafka.main.discovery.wmnet

Event Timeline

Ottomata created this task.Jan 11 2019, 4:30 PM
Ottomata triaged this task as Normal priority.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 11 2019, 4:30 PM

Discovery records for kafka would come handy in the logging pipeline case too, namely during datacenter failover to move producers off a given datacenter (and roll-restart the producers)

Joe added a comment.Jan 15 2019, 3:11 PM

Sorry, I need some more specifics:

you want to make a dns query, and get as a response the "nearest" kafka cluster in the form of a list of hostnames/ports? Are we talking about discovery SRV records?

I am not sure we can do that at the moment, but there are other ways to have etcd as a source of truth for such data.

No no for me, all I want is an alias for the list of Kafka brokers in a given Kafka cluster. I don't need any DC failover stuff. Perhaps discovery is not the right word here. Round Robin DNS might be enough for me.

Joe added a comment.Jan 15 2019, 3:26 PM

Might I suggest that you use a SRV dns record instead? It's more appropriate for enumerating members in a cluster. We use those for etcd discovery.

Change 484509 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/dns@master] Add round robin DNS records for Kafka clusters

https://gerrit.wikimedia.org/r/484509

Once we get it we would need to update Change-Prop, JQ-Change-Prop, EventBus-service, event streams to use the new DNS record.

Might I suggest that you use a SRV dns record instead? It's more appropriate for enumerating members in a cluster. We use those for etcd discovery.

That would be indeed the best solution, but does kafka support that?

Kafka doesn't support SRV. Hence my Round Robin DNS patch. After more discussion with @BBlack, I think I've decided to abandon this idea and just hardcode the Kafka brokers for now. @akosiaris mentioned that they have identified this problem (config management in helm charts) in other areas too, so I'll just hardcode for now and hope for a better future.

Change 484509 abandoned by Ottomata:
Add round robin DNS records for Kafka clusters

https://gerrit.wikimedia.org/r/484509

akosiaris changed the task status from Open to Stalled.Apr 8 2019, 2:37 PM

Stalling until we have some sane solution.