Page MenuHomePhabricator

Topics are created lazily and it's a problem for CP
Closed, ResolvedPublic

Description

At some point we've used to actively create kafka topics for Event-Platform by running ensure-topics-exist script when deploying new versions of event schemas. However, we've switched to automatic topic creation, which worked well for us in production, but apparently it doen't quite work in vagrant.

The problem is, that the topics are created automatically, but lazily, whenever the first message is produced. In production it's more-or-less OK, since we separate deployment of a consumer (change-prop) from the deployment of a producer (EventBus), and the rate of messages is pretty high, so the topic get created before changes using them in Change-Prop gets deployed. In production that works when producer and consumer are separate services.

However, in Vagrant, when the user updates after a long period of time, both new versions of a EventSchemas and Change-Prop gets started at the same time. ChangeProp then tries to subscribe to topics, that are non-existent yet (no messages were produced), which is apparently not handled well by the driver, since even when the topic is created afterwards by the same message, Change-Prop doesn't really realize that.

In production the issue also exist. If CP is both a producer and a consumer of a certain topic (like retry topic for example), deployment will be a chicken and egg problem - need to deploy and start producing and then restart for consumers to catch on.

I'm not quite sure how should we candle this issue. One idea is to revive the custom topic creation script and run it as a part of vagrant git-update command and event-schemas deployment. This might also help if we want to start supporting custom per-topic configs for T157822 or T157092

Other option is to automatically create a topic in CP if it doesn't exist, but I think it's a pretty dangerous and bad option.

Event Timeline

Pchelolo renamed this task from Change-Prop in Vagrant: topics are created lazily to Change-Prop in Vagrant: topics are created lazily and it's a problem.Mar 7 2017, 10:20 PM

One idea is to revive the custom topic creation script

Not opposed to this, but if we do it, I starting to think I don't want to do it in eventlogging repo. Since we've (reluctantly) got the topics config in the event-schemas repo, maybe we can make a little script there to do it? Kinda sucks, because then that repo would have to depend on a kafka client. Hm.

Other option is to automatically create a topic in CP if it doesn't exist, but I think it's a pretty dangerous and bad option.

Why is this dangerous and bad? It sounds fine to me, and seems to be what most kafka clients do.

Or, instead of failling if a topic doesn't exist, change-prop could try again every few minutes.

Or, instead of failling if a topic doesn't exist, change-prop could try again every few minutes.

Ye, that's another option which I want to explore. Need a bit of coding though, because subscribe doesn't really report that the topic doesn't exist, it's just does nothing. Maybe a bug in the driver..

Poked librdkafka and the driver around this issue a bit. Create https://github.com/Blizzard/node-rdkafka/issues/140 to discuss an issue with driver maintainer, let's see what he has in mind.

Suggestion made by @Ottomata to periodically check for topic existence is totally implementable, but it has a couple of issues:

  1. It's a HUGE disruption to ChangeProp startup process.
  2. Let's say prior to subscribing, we fetch metadata, find out topic is not there and set up periodic checking whether the topic exist or not. Then once it appears we subscribe. But, for example in Vagrant environment, for the every topic to appear the user need to manually provoke some event - delete page, restore page, etc. Thank might not happen ever and change-prop will be just poking kafka consuming resources.

Another supported option is to create a topic with default parameters on subscribe. But since we want to support custom topic parameters that's not a good option in my opinion.

Why is this dangerous and bad? It sounds fine to me, and seems to be what most kafka clients do.

I can't give you specific examples, it just feels that it's going against our attitude to puppetise/control everything..

Pchelolo renamed this task from Change-Prop in Vagrant: topics are created lazily and it's a problem to Topics are created lazily and it's a problem for CP.Mar 8 2017, 8:29 PM
Pchelolo updated the task description. (Show Details)

Hm, so after talking a bit to the driver maintainer, we've came to a conclusion that it's definitely wrong for the driver to fail like this silently, but he's on the edge whether to include a check into the library or leave it for the clients to implement. There's definitely no clear way to dynamically recheck topic existence and update the subscription inside the driver code, so if we wanna do that, we'd need write it on our own (which is not that complex, but it does disrupt the startup process quite a bit)

I still think for some reason, that creating topics with correct properties is the producer job, not consumer, so I still think it's better to resurrect the topic creation script (use standard kaka-topics.sh to avoid dependencies) and kill change-prop on startup if it tries to subscribe to a non-existing topic.. @mobrovac @Ottomata What do you think?

I have always been an advocate of keeping things strict in EventBus, so I would love to see the previous behaviour resurrected. More specifically, I think it's the job of the EventBus system to set up the topics correctly; the producer nor the consumer should worry about that, but rather expect them to be set up properly.

However, I think that we should automate that process. Perhaps have Puppet run the script every time the config file changes?

However, I think that we should automate that process. Perhaps have Puppet run the script every time the config file changes?

YES! That should not be to hard to do.

Change 342268 had a related patch set uploaded (by Ppchelko):
[mediawiki/event-schemas] WIP: Added pure-bash topic creation script.

https://gerrit.wikimedia.org/r/342268

Hm, I think depending on the kafka-topics.sh script is worse than using a kafka client. kafkacat I suppose would be ideal.

kafka-topics (and the required jars) are only installed on kafka brokers. We put eventbus and event-schemas on the main kafka brokers just for convenience. In the future I betcha eventbus, etc. will be deployed via kubernetes, and will no longer have access to the kafka jars.

I don't see why ensure-kafka-topics-exist couldn't be either puppetised or put in a separate package and be installed on the broker nodes regardless of where the EventBus HTTP Proxy service is installed.

Since our upgrade to node-rdkafka 0.8.0 and librdkafka 0.9.4 this is not an issue any more. Consumers now periodically recheck metadata and when the topic is eventually created they start fetching messages. See https://github.com/Blizzard/node-rdkafka/issues/140#issuecomment-286572468

I think I'll close this issue as resolved now, not strictly needed to ensure all topics exist before starting change-prop any more. When the actual need for custom topic configurations will appear, we can create a new ticket to talk about topic creation.

Change 342268 abandoned by Ppchelko:
WIP: Added pure-bash topic creation script.

Reason:
We've settled on topic auto-creation. Not needed.

https://gerrit.wikimedia.org/r/342268