add Icinga alert on Varnish backends that are close to maxing out their allowed connections to their applayer backends
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	CDanis
	May 31 2019, 3:43 PM

Description

from a discussion in #wikimedia-traffic today:

[15:27:51] <cdanis> it just occurred to me -- we *don't* have any alerting for a particular varnish maxing out on its connections-to-a-backend limit, right? perhaps we should
[15:29:34] <bblack> cdanis: yes, that might be wise.  It's a little tricky to implement, too, though.
[15:29:50] <bblack> the limits are per-backend-service, per-cache-node
[15:30:10] <bblack> so e.g. cp1075 might have a 1K limit for connections to appservers.svc, and a separate 10K limit for connections to restbase.
[15:30:42] <bblack> you'd have to pull those limits out of the varnish config, and compare them to the per node->service connection data that's coming into... I guess prometheus now?
[15:31:00] <bblack> but yeah, an alert on any node maxing out its connection pool to any service would be helpful
[15:31:35] <cdanis> I think you would implement it by also exporting into prometheus the service limits, named the same way as they are in the service names coming from varnish's exports
[15:31:36] <bblack> (and really, we set the limits much higher than the normal parallelism load to allow for small spikes and stuff.  Probably even crossing ~50% of the allowed limit should at least warn if not crit)
[15:31:54] <cdanis> then you can have a simple subtraction in the alerting rule
[15:32:51] <cdanis> I'll file a ticket, this is worth doing -- in my limited experience, lots of real problems at the traffic and/or appserver layer correlate with connections-maxed-out very well

Related Objects

Mentioned In: T180307: phab irc lang parser doesn't work if times are shown

Event Timeline

CDanis created this task.May 31 2019, 3:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 31 2019, 3:43 PM

CDanis triaged this task as Medium priority.May 31 2019, 3:45 PM

Peachey88 mentioned this in T180307: phab irc lang parser doesn't work if times are shown.Jun 1 2019, 12:48 AM

• ema moved this task from Backlog to Caching on the Traffic board.Jun 3 2019, 3:09 PM

We don't have varnish-be anymore.

add Icinga alert on Varnish backends that are close to maxing out their allowed connections to their applayer backendsClosed, InvalidPublicActions

Description

Related Objects

Event Timeline

add Icinga alert on Varnish backends that are close to maxing out their allowed connections to their applayer backends
Closed, InvalidPublic
Actions