scrape ripe atlas data for a few anchors at other large networks
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	CDanis
	May 15 2020, 3:44 PM

Description

To provide a baseline of comparison when the internet at large seems to be suffering.

Large transit providers seem like the ideal choice?

Related Objects

Mentioned In: T299640: RIPE Atlas exporter improvements
Mentioned Here: T251156: add traceroute measurements to RIPE Atlas prometheus data

Event Timeline

CDanis created this task.May 15 2020, 3:44 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 15 2020, 3:44 PM

Most transit providers don't participate in RIPE Atlas. Here's the ones who do, in order of CAIDA AS rank:

NTT us-atl-as2914 us-dal-as2914 us-mia-as2914
PCCW de-fra-as3491 hk-hkg-as3491 mz-mpm-as3491 us-rtv-as3491 za-jnb-as3491
Deutsche Telekom de-fra-as3320 us-nyc-as3320

That's all anchors for the complete set of networks listed on https://en.wikipedia.org/wiki/Tier_1_network#List_of_Tier_1_networks

I think we could scrape a few of those anchors, distributed across networks and geographically, and maybe also a few from large content networks (Amazon and Google each have 15+ anchors, for instance).

RLazarus triaged this task as Medium priority.May 19 2020, 5:34 PM

jbond subscribed.May 20 2020, 4:25 PM

Good idea! What's the limit?
I'd suggest:

Comcast - large US ISP - https://atlas.ripe.net/probes/6080/ - https://atlas.ripe.net/probes/6072/
RIPE to have something to compare esams with - https://atlas.ripe.net/probes/6307/
LACNIC to have something in South America - https://atlas.ripe.net/probes/6054/
And on the other coast - https://atlas.ripe.net/probes/6554/
This one sponsored by APNIC, located in Singapore - https://atlas.ripe.net/probes/6096/
https://atlas.ripe.net/probes/6380/ and https://atlas.ripe.net/probes/6358/ to have something on both sides of Africa

We could add more regions depending on the "granularity" we want

Which measurements to you plan to scrap?

all measurements the anchors are performing outbound?
the anchoring measurements directed at thes anchor.

If its the later then i think picking the google and amazon ones are good choices as we could create a dash board similar to https://grafana-next.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 which presents atlas's view on google and amazon.

If its the outbound tests that we want to scrap then the goal is to pick a stable set of hosts and a common set of destinations that all anchors measure. for this i would pick one of the in build measurements backing DNSMON. Specificity i would probably pick the ping tests going to

a.root-servers.net. j.root-servers.net - Verisign
c.root-servers.net - comcast
f.root-servers.net - ISC/which a lot of hosting provided by cloudflare
k.root-servers.net - ripe
l.root-servers.net - ICANN

On the anchor side of things we can filter the anchors based on the system tags system-ipv6-stable-90d and system-ipv4-stable-90d. One additional feature of using the root servers is that they are measured by all probes in the network, which can also be filtered based on those flags, so there are many more data points. theses measurements power https://atlas.ripe.net/results/maps/

Of course theses are all densely anycast addresses which gives a good overview of the internet however it is a slightly different measurement then the ones going to our unicast addresses so not sure if its the right pick.

Something else we could also do is try to get all stable anchors in Singapore, Amsterdam, Ashburn , Dallas and San Francisco.

In T252890#6154888, @ayounsi wrote:

Good idea! What's the limit?

While I'm not very concerned about cardinality limits in Prometheus here, I think we'd start to hit some scalability limits of atlas_exporter after even just several more measurements. There's two reasons there:

Unless you have atlas_exporter configured to use RIPE's streaming API, then: every time you make a request to /metrics on the exporter, it performs a query against the Atlas API for the most recent state of all probes in every requested measurement. This is a lot of data (see next bullet). However, last I tried, atlas_exporter's streaming client seemed experimental; I observed both intervals with missing data, and frequent crashes due to race conditions in its code.

For our current 10 measurements we're scraping, it takes 2-3 seconds for the Prometheis to fetch atlas_exporter's /metrics, and the response size is ~10Mbyte of Prometheus-formatted metrics. This would be true regardless of using the streaming API or not, but if we really needed, we could shard metrics across multiple scrapes made by Prometheus (although it would be kind of gross).

So unless we can get the streaming client working reliably*, I'd like to only add about 5-10 new measurements here.

(*: I don't see any major changes upstream since I last tried this, so this seems like non-trivial work)

In T252890#6155028, @jbond wrote:

Which measurements to you plan to scrap?

all measurements the anchors are performing outbound?

the anchoring measurements directed at thes anchor.

Right now we only scrape our own anchoring measurements*, and that's what I was imagining here, at least for the other content networks.

(*: In fact, our current scraping isn't even all our anchoring measurements -- we only scrape our own anchoring ICMP ping measurements, and should scrape at least our anchoring traceroutes as well, and plot things like median/mode/p95 num-hops -- see T251156)

But your points about DNSMON are great, hadn't known about any of that -- thanks!

There's a lot to think about here, and I'm not sure what to do.

Do want: to gather some sort of 'baseline' that helps point to issues within large backbone networks

Don't want: to greatly increase the data set size (so only add 5-10 measurements, and probably don't add any measurements in which every probe participates)

Would be nice: to have a baseline that reflects "internet usability" more directly from users' points of view, but only if not a huge volume of data & not too noisy
- High number of measurements is both a scalability problem, and could make it difficult to eyeball a comparison between metrics (or we figure out some good way of aggregating many measurements?)

I think I'm leaning towards a few stable anchors in similar geographic locations to our PoPs. Maybe also a few root servers as well even though they're less apples-to-apples. Other thoughts very appreciated :)

I think I'm leaning towards a few stable anchors in similar geographic locations to our PoPs. Maybe also a few root servers as well even though they're less apples-to-apples.

i think i agree with this and think i would prioritize the former as we can always use dnsmon* to get a feel for the later

*just a note on DNSMON the default limits are very high to the point that you would only really notice a colour change if a large potion of the internet was down. I tend to set the low value at about 3% and the top value at about 10%, doing so shows an ipv6 event earlier today

Aklapper added a project: Infrastructure-Foundations.Jun 21 2021, 8:59 PM

ayounsi mentioned this in T299640: RIPE Atlas exporter improvements.Jan 20 2022, 1:02 PM

ayounsi moved this task from Backlog to Watching on the netops board.Aug 26 2022, 1:05 PM

@CDanis Is that still needed now that we have NEL?

In T252890#9165519, @ayounsi wrote:

@CDanis Is that still needed now that we have NEL?

It would be interesting to do, but only as more of a curiosity or a research project. NEL serves the actual need here much better :)

	F31844161: image.png
	May 27 2020, 12:55 PM

scrape ripe atlas data for a few anchors at other large networksClosed, DeclinedPublicActions

Description

Related Objects

Event Timeline

scrape ripe atlas data for a few anchors at other large networks
Closed, DeclinedPublic
Actions