Page MenuHomePhabricator

Kafka-topics broken in beta: "zookeeper is not a recognized option"
Closed, ResolvedPublic

Description

While testing stuff in beta, I tried a user flow that enqueues a job, but that job never got enqueued. I SSH'd to the kafka hosts to see what's going on, and it looks like kafka-topics is straight away broken:

daimona@deployment-kafka-main-5:~$ kafka topics --list
kafka-topics --zookeeper deployment-zookeeper-3.deployment-prep.eqiad1.wikimedia.cloud/kafka/main-deployment-prep --list
zookeeper is not a recognized option
joptsimple.UnrecognizedOptionException: zookeeper is not a recognized option
	at joptsimple.OptionException.unrecognizedOption(OptionException.java:108)
	at joptsimple.OptionParser.handleLongOptionToken(OptionParser.java:510)
	at joptsimple.OptionParserState$2.handleArgument(OptionParserState.java:56)
	at joptsimple.OptionParser.parse(OptionParser.java:396)
	at org.apache.kafka.tools.TopicCommand$TopicCommandOptions.<init>(TopicCommand.java:802)
	at org.apache.kafka.tools.TopicCommand.execute(TopicCommand.java:97)
	at org.apache.kafka.tools.TopicCommand.mainNoExit(TopicCommand.java:87)
	at org.apache.kafka.tools.TopicCommand.main(TopicCommand.java:82)

(and same result in deployment-kafka-main-6, FWIW).

I then tried to enqueue a job via eval.php (as in T387631#10647693), and that at least seems to work.

I don't know if the kafka-topics error is also why jobs seemingly aren't enqueued, but I'd rather hold off any additional testing until that issue is sorted out.

(Besides, it would be nice if these things were caught earlier and by some tool that isn't me, as this is at least the third time I'm left puzzled by jobs not working in beta, after T387631 and T401002).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Possibly related to T416669: Upgrade Kafka to version 3.x? But I'm not sure if they upgraded beta kafka?

We did.
cc @elukey we need to fix the kafka script wrapper to automatically inject the right flags, depending on the kafka version.

Change #1272588 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka: deploy a new kafka script wrapper when installing kafka 3.7

https://gerrit.wikimedia.org/r/1272588

Change #1272588 merged by Brouberol:

[operations/puppet@production] kafka: deploy a new kafka script wrapper when installing kafka 3.7

https://gerrit.wikimedia.org/r/1272588

Fixed on kafka-test, which was also migrated to kafka 3.7:

brouberol@kafka-test1008:~$ kafka topics --list | head
kafka-topics --bootstrap-server kafka-test1006.eqiad.wmnet:9092,kafka-test1007.eqiad.wmnet:9092,kafka-test1008.eqiad.wmnet:9092,kafka-test1009.eqiad.wmnet:9092,kafka-test1010.eqiad.wmnet:9092 --list
DataHubUpgradeHistory_v1
DataHubUsageEvent_v1
FailedMetadataChangeEvent_v4
FailedMetadataChangeProposal_v1
MetadataAuditEvent_v4
MetadataChangeEvent_v4
MetadataChangeLog_Timeseries_v1
MetadataChangeLog_Versioned_v1
MetadataChangeProposal_v1
brouberol@kafka-test1008:~$

Once puppet runs on the beta brokers, you should be good. Can you confirm? Thanks!

Once puppet runs on the beta brokers, you should be good. Can you confirm? Thanks!

Yup, thanks, I can now see the list of topics. Jobs still aren't being processed, but since it's a different root cause, I filed it separately as T423615. Happy to close this if y'all have nothing else to do here.