Occurred two times up to now during the following timeframes:
- 17/03 ~ midnight UTC
- 20/03 ~ 03:17:41 UTC
For the first one (reporting an email to analytics-internal):
Found the following error in eventlog1001:
2017-03-17 23:58:36,111  (MainThread) <BrokerConnection host=kafka1022.eqiad.wmnet/2620:0:861:106:10:64:36:122 port=9092> timed out after 40000 ms. Closing connection.
On Kafka1022 a lot of the following error messages started before the above timeout:
[2017-03-17 23:00:13,317] ERROR Processor got uncaught exception. (kafka.network.Processor)
This also caused MirrorMaker process down failures. Could be related to https://issues.apache.org/jira/browse/KAFKA-3593, maybe something also due to the last librdkafka?
From https://github.com/edenhill/librdkafka/wiki/Broker-version-compatibility it seems that we should set broker.version.fallback=0.9.0.1 everywhere (default broker.version.fallback=0.9.0.0), but it might be something different.