We want to limit the total number of EventStreams connections to Kafka. To do this, we really just need to limit the total number of concurrent EventStreams connections. This is more difficult than it sounds, since EventStreams connections are long lived never closed HTTP connections. To do a real total limit, we need to keep a distributed counter somewhere, perhaps redis. However, there doesn't seem to be an easy consistent way to reliably decrement the counter. A SIGKILL to the EventStreams process would cause the process to die without decrementing the counter.
Another less precise idea would be to do a non distributed counter somewhere. Varnish already supports a per-backend max_connections. The backend in this case is the eventstreams.svc.$dc.wmnet URL. So, we could do limiting at the varnish instance level. Depending on how the request is hashed to the varnish backend, some requests might reach the limit while there are plenty of open slots on other varnish instances.
We actually have the total number of connections (per stream) in statsd/graphite, but that works because statsd aggregates the counts for us as a gauge. We certainly aren't going to query graphite for this, but maybe there is some what to use an expiring gauge in redis? Or, perhaps we can reuse https://github.com/wikimedia/limitation with the same interval counter stuff we use in statsd?
Any other ideas?