Page MenuHomePhabricator

De-noise ipsec alerts (Reduce Icinga alert noise goal)
Open, Needs TriagePublic

Description

Today when an ipsec tunnel goes down, a large number of host alerts will fire. This happens because a single icinga check per-host handles multiple tunnels.

We should be able to move to this to a prometheus check. High level we would need to...

  • Add ipsec tunnel status metrics to prometheus
  • Alert on the aggregate ipsec status metrics
  • Phase out the host based ipsec checks

Event Timeline

herron created this task.Fri, Aug 9, 7:59 PM
herron added a comment.Fri, Aug 9, 8:08 PM

I've drafted a prometheus-ipsec-exporter package based on https://github.com/dennisstritzke/ipsec_exporter on boron.

Due to the dependencies, it currently builds successfully for buster. And the resulting package installs successfully on stretch.

I don't currently have the privs to create a new gerrit project operations/debs/prometheus-ipsec-exporter, so fo the time being it will have to be browsed from my homedir on boron:

boron:/home/herron/prometheus-ipsec-exporter/prometheus-ipsec-exporter-0.3.1

And the built package can be fetched from:

boron:/var/cache/pbuilder/result/buster-amd64/prometheus-ipsec-exporter_0.3.1-1_amd64.deb

Light testing has been successful using a stretch VM running strongswan.

Change 530203 had a related patch set uploaded (by Herron; owner: Herron):
[operations/debs/prometheus-ipsec-exporter@master] prometheus-ipsec-exporter: initial commit of version 0.3.1

https://gerrit.wikimedia.org/r/530203

Change 530616 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] prometheus: add prometheus ipsec exporter service & config

https://gerrit.wikimedia.org/r/530616

Change 530203 merged by Herron:
[operations/debs/prometheus-ipsec-exporter@master] prometheus-ipsec-exporter: initial commit of version 0.3.1

https://gerrit.wikimedia.org/r/530203