Varnishkafka seems reporting errors from July 28th ~18 UTC:
https://grafana.wikimedia.org/dashboard/db/varnishkafka?panelId=20&fullscreen&orgId=1&from=now-30d&to=now
{F8977169}
This seems to be related to the following error constantly being logged on all the brokers (note the timing):
```
[2017-07-28 17:39:53,099] INFO Rolled new log segment for 'webrequest_text-15' in 23 ms. (kafka.log.Log)
[2017-07-28 17:39:55,750] INFO Rolled new log segment for 'webrequest_text-8' in 1 ms. (kafka.log.Log)
[2017-07-28 17:40:30,308] INFO Rolled new log segment for 'mediawiki_ApiAction-9' in 1 ms. (kafka.log.Log)
[2017-07-28 17:41:01,705] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.ArrayIndexOutOfBoundsException: 18
at org.apache.kafka.common.protocol.ApiKeys.forId(ApiKeys.java:68)
at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:39)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:79)
at kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:426)
at kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:421)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.run(SocketServer.scala:421)
at java.lang.Thread.run(Thread.java:745)
[2017-07-28 17:41:31,700] WARN [ReplicaFetcherThread-1-20], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@105e15b2. Possible cause: java.io.IOException: Connection to 20 was disconnected before the response was read (kafka.server.ReplicaFetcherThread)
```
The corresponding Varnishkafka errors are the following (this one belongs to cp4015 that was one of the first to show it):
```
Jul 28 17:46:00 cp4015 varnishkafka[1655]: %3|1501263960.918|FAIL|varnishkafka#producer-1| [thrd:kafka1020.eqiad.wmnet:9092/bootstrap]:
Jul 28 17:46:00 cp4015 varnishkafka[1655]: KAFKAERR: Kafka error (-192): kafka1020.eqiad.wmnet:9092/20: 1 request(s) timed out: disconn
Jul 28 17:46:01 cp4015 varnishkafka[1655]: %3|1501263961.919|FAIL|varnishkafka#producer-1| [thrd:kafka1020.eqiad.wmnet:9092/bootstrap]:
Jul 28 17:46:01 cp4015 varnishkafka[1655]: KAFKAERR: Kafka error (-192): kafka1020.eqiad.wmnet:9092/20: 5 request(s) timed out: disconn
Jul 28 17:46:02 cp4015 varnishkafka[1655]: %3|1501263962.957|FAIL|varnishkafka#producer-1| [thrd:kafka1020.eqiad.wmnet:9092/bootstrap]:
Jul 28 17:46:02 cp4015 varnishkafka[1655]: KAFKADR: Kafka message delivery error: Local: Timed out in queue
Jul 28 17:46:02 cp4015 varnishkafka[1655]: KAFKADR: Kafka message delivery error: Local: Timed out in queue
Jul 28 17:46:02 cp4015 varnishkafka[1655]: KAFKADR: Kafka message delivery error: Local: Timed out in queue
Jul 28 17:46:02 cp4015 varnishkafka[1655]: KAFKADR: Kafka message delivery error: Local: Timed out in queue
[.. repeated..]
```
https://issues.apache.org/jira/browse/KAFKA-3547 seems possibly related, maybe a new producer/consumer is causing these timeouts? Event streams timings might correlate?