There's a kafka-mirror unit in systemd on main-kafka boxes, that it constantly in Active (stoped) state. I'm not sure what exactly that is, but it seems like a remaining of an older mirror-maker, so it should be cleaned up.
Description
Event Timeline
So this kafka-mirror instance is defined as follows:
elukey@kafka1001:~$ sudo systemctl cat kafka-mirror # /lib/systemd/system/kafka-mirror.service [Unit] Description=Kafka MirrorMaker - All Instances [Service] Type=oneshot RemainAfterExit=true ExecStart=/bin/true ExecReload=/bin/true
And this is a regular mirror maker instance:
elukey@kafka1001:~$ sudo systemctl cat kafka-mirror-main-codfw_to_main-eqiad@0.service # /lib/systemd/system/kafka-mirror-main-codfw_to_main-eqiad@0.service # NOTE: This file is managed by Puppet. [Unit] Description=Kafka MirrorMaker Instance of main-codfw_to_main-eqiad@0 [Service] User=kafka Group=kafka [..] [Install] WantedBy=kafka-mirror.service multi-user.target
The kafka-mirror unit is only a catch all to restart all the mirror maker instance on the same host in one go. It was probably stopped during the outage but as far as I can see now it seems working everywhere. It is also named kafka-mirror.service - Kafka MirrorMaker - All Instances, so it seems clear enough, I'd keep it without changing it.
I created this when we were running multiple MirrorMaker instance on the same box. This service allows you to stop and start all of them at once. As we only have one instance per box everywhere right now, it isn't needed, but in the future if we need to scale the MirrorMaker producer, we may need to spawn more than once instance per box. Let's keep this.
Ye, I just didn't really know what exactly that is and in the heat of the outage I was a bit confused.