We have noticed that varnishkafka often disconnects from kafka servers. The issue is unrelated to the v4 upgrade, it has been observed on machines still running Varnish 3 as well as on servers running Varnish 4.
The peculiarity of the disconnections is that they seem to happen at fixed intervals of 5 or 10 minutes.
Aug 29 09:00:20 cp1052 varnishkafka[25020]: %3|1472461220.177|FAIL|varnishkafka#producer-1| kafka1012.eqiad.wmnet:9092/12: Receive failed: Disconnected Aug 29 09:00:20 cp1052 varnishkafka[25020]: KAFKAERR: Kafka error (-195): kafka1012.eqiad.wmnet:9092/12: Receive failed: Disconnected Aug 29 09:10:19 cp1052 varnishkafka[25020]: %3|1472461819.996|FAIL|varnishkafka#producer-1| kafka1018.eqiad.wmnet:9092/18: Receive failed: Disconnected Aug 29 09:10:20 cp1052 varnishkafka[25020]: KAFKAERR: Kafka error (-195): kafka1018.eqiad.wmnet:9092/18: Receive failed: Disconnected Aug 29 09:15:20 cp1052 varnishkafka[25020]: %3|1472462120.086|FAIL|varnishkafka#producer-1| kafka1012.eqiad.wmnet:9092/12: Receive failed: Disconnected Aug 29 09:15:20 cp1052 varnishkafka[25020]: KAFKAERR: Kafka error (-195): kafka1012.eqiad.wmnet:9092/12: Receive failed: Disconnected
The problem started on Aug 08 at 18:30:27 (precisely), which is the same time as the following SAL entry:
18:30 ottomata: restarting kafka broker on kafka1013 to test eventlogging leader rebalances
