Page MenuHomePhabricator

add Icinga alert on Varnish backends that are close to maxing out their allowed connections to their applayer backends
Closed, InvalidPublic

Description

from a discussion in #wikimedia-traffic today:

[15:27:51] <cdanis> it just occurred to me -- we *don't* have any alerting for a particular varnish maxing out on its connections-to-a-backend limit, right? perhaps we should
[15:29:34] <bblack> cdanis: yes, that might be wise.  It's a little tricky to implement, too, though.
[15:29:50] <bblack> the limits are per-backend-service, per-cache-node
[15:30:10] <bblack> so e.g. cp1075 might have a 1K limit for connections to appservers.svc, and a separate 10K limit for connections to restbase.
[15:30:42] <bblack> you'd have to pull those limits out of the varnish config, and compare them to the per node->service connection data that's coming into... I guess prometheus now?
[15:31:00] <bblack> but yeah, an alert on any node maxing out its connection pool to any service would be helpful
[15:31:35] <cdanis> I think you would implement it by also exporting into prometheus the service limits, named the same way as they are in the service names coming from varnish's exports
[15:31:36] <bblack> (and really, we set the limits much higher than the normal parallelism load to allow for small spikes and stuff.  Probably even crossing ~50% of the allowed limit should at least warn if not crit)
[15:31:54] <cdanis> then you can have a simple subtraction in the alerting rule
[15:32:51] <cdanis> I'll file a ticket, this is worth doing -- in my limited experience, lots of real problems at the traffic and/or appserver layer correlate with connections-maxed-out very well