Page MenuHomePhabricator

Add RIPE atlas data to Prometheus
Closed, ResolvedPublic

Description

Filippo made me aware of this great tool.

As we are hosting RIPE atlas anchors, the RIPE runs permanent measurements to our DCs.

They recently released a Prometheus exporter allowing us to import those measurements into Prometheus

See https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement-results-in-prometheus-with-atlas_exporter
and https://github.com/czerwonk/atlas_exporter for more informations.

Here are the measurements IDs we would need to query, to be configured under

static_configs:
  - targets:

CODFW

Typev4v6
Ping17912101791212
Traceroute17912091791211
HTTP28416432841644

EQIAD

Typev4v6
Ping17909451790947
Traceroute17909441790946
HTTP28416822841683

ULSFO

Typev4v6
Ping17913071791309
Traceroute17913061791308
HTTP28416952841697

ESAMS and Singapore will need to be added when Anchor is setup.

Event Timeline

@ayounsi awesome!

I'll outline here what needs to happen next:

  • Create a Debian package for atlas_exporter
  • Create the respective puppet module, similarly to existing exporters
  • Install the package and try the configuration on e.g. prometheus-beta, decide on things like how often to poll for data
  • Add the puppet role/profile/module to our Prometheus global instances (i.e. codfw/eqiad), I'd say running one atlas_exporter per prometheus machine is more than enough, each prometheus machine simply polls from localhost

@fgiunchedi: This task has no active projects associated. Should this be under User-fgiunchedi ? And/or SRE ? Or closed? Thanks in advance!

The steps outlined in Filippo's comment happened, with the difference that I chose to use the netmon* machines for this role.

The dashboard needs a bit more work, and we could look at also ingesting measurements other than ICMP pings, but this is done-ish.
https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas

Thanks this is really nice!

It could be useful to add https://grafana.com/grafana/plugins/grafana-worldmap-panel and show the data on a map as well.

I'm interested in adding the traceroute hop count, for example to detect major routing changes. I looked a bit at the code, but would need to be walked through it to understand it fully.

Broke out some pending nice-to-haves into child tasks; resolving this one.