Looking at how to get smokeping data in prometheus/grafana I found that the Prometheus blackbox_exporter could potentially replace Smokeping in a more efficient way:
- better event correlation (eg. can have on the same dashboard network latency/loss and applications errors)
- centralized data (remove the need of yet another tool)
- time series database (instead or RRD files)
- Distributed (can run on any server in a P2P way, which is possible with Smokeping but more complex)
- Easier configuration
On the points to be researched more:
- Is it possible to reproduce Smokeping's alerts in Grafana/Prometheus? https://github.com/wikimedia/puppet/blob/production/modules/smokeping/files/config.d/Alerts
- At which frequency will the tests be ran? Eg. Smokeping currently spreads 20 pings across 300s
Experimental/PoC dashboard: https://grafana.wikimedia.org/d/CbNAwAXnk/filippo-blackbox-smoke-icmp