Page MenuHomePhabricator

Monitor IPsec status
Closed, ResolvedPublic

Description

At minimum, we require a Nagios monitor to confirm that IPsec transports are established to secure inter-datacenter traffic beween Varnish nodes. There are multiple transports per pair of nodes due to IPv4 + IPv6. Additionally, this mechanism should support monitoring the state of IPsec between Varnish nodes and Kafka brokers, if IPsec is selected as the mechanism to secure webrequest traffic in T92602.

The first step for this task is to evaluate existing IPsec monitoring scripts:
http://exchange.nagios.org/directory/Plugins/Security/check_ipsec/details
http://wiki.itadmins.net/doku.php?id=icinga-nagios:check_ipsec2
http://lists.pfsense.org/pipermail/list/2014-June/006468.html

Initial review shows that these operate by either simply counting the number of established SAs, or by having a separate monitor defined for each SA. In both cases the plugin will need remote execution by NRPE or similar. The worst case for this sort of monitor is on text caches in EQIAD, which will each have two SAs for each text cache in ESAMS + ULSFO = 72 SAs. The 24 additional Varnish nodes due to be installed at ESAMS in T92514 will add 48 SAs for a total of 120 SAs on each EQIAD text cache node.

Beyond the minimum of confirming established security associations via Icinga, we may wish to monitor for warnings or errors events from the daemons, as well as storing statistics such as traffic throughput or authentication request rate in Graphite. See the output of 'ipsec listcounters'.

Event Timeline

Gage raised the priority of this task from to Needs Triage.
Gage updated the task description. (Show Details)
Gage subscribed.
Gage triaged this task as Medium priority.
Gage set Security to None.

Patch is submitted: https://gerrit.wikimedia.org/r/#/c/199787/

Instead of simply counting established Security Assocations or defining a monitor for each SA, this monitor parses 'ipsec statusall' to report number of connections established, connecting, and not connected. Executed by NRPE.