This is a tracking task for RPKI Validation, description will be updated as the project evolves.
Short description
RPKI + Origin Validation uses cryptography to ensure a prefix received from our BGP peers is advertised from its legitimate owner.
So far about 13% of the Internet's {prefix/ASN} pairs are signed and valid, 0.76 are invalid, see https://rpki-monitor.antd.nist.gov/
It reduces the risk to route traffic to an AS accidentally or maliciously advertising a prefix that it doesn't own.
Combined to short AS paths (eg. peering), it also helps prevent BGP MitM hijack.
The system comes in 2 parts:
- Validator, software that runs on a normal server, downloads the ROAs from the RIRs and verifies them
- Router, uses the RPKI-to-Router protocol to get the validated data from the validator to the routers
It's also possible that a dedicated daemon implements RPKI-to-Router (eg. GoRTR)
Validator software selection
Looking at the landscape, my shortlist comes down to:
- OctoRPKI + gortr from Cloudflare
- Routinator from NLnetLabs
So far (and after dismissing it by miss-understanding) my vote goes to Routinator for the following reasons:
- RTR daemon embedded, no need to package/run two tools
- More active development, explicit roadmap. There is a risk that Cloudflare's tool only get update for their needs
- Whitelist support
Location to run the validator from
Routers support multiple validators.
Routinator "Running it on a system with 1GB of available RAM and 1GB of available disk space will give the global RPKI data set enough room to grow for the forseeable future."
To get the ROAs, Routinator uses rsync, which should be able to use the Squid proxies (to be tested).
To be researched: I don't know yet if the dedicated protocol RRDP used by OctoRPKI, and ideally by Routinator down the road, supports proxies.
With that in mind and as of today, the best location is a private VM the Ganeti clusters in eqiad and codfw.
If dedicated two VMs for this is too much, another option would be to run the Validator on the netmon hosts. this also solves the proxies concerns as it would have direct Internet access.
To be researched: I'm not sure yet if bringing the validator closer to the routers (eg. in POPs) brings significant improvements. If so and once T96852 is solved, we could bring them closer to the POP routers.
Monitoring, still TBD
- Both Routinator and OctoRPKI provide a Prometheus endpoint.
To be researched: it would be useful to get alerts if:
- A router can't reach its configured Validators
- The data provided to the routers gets stale
eg. Juniper doesn't seem to implement this MIB
Enforcement
The big question is: what to do once the routers know the validation status of a {ASN,prefix} pair?
First (easy) step is to change the BGP local pref, so for two identical prefixes, our router will prefer to use the one coming from a valid AS# (or avoid the invalid one).
This would not protect against a more specific prefix originating from an invalid ASN.
The current state of RPKI is that there are many "RPKI Unreachable" subnets, which are subnets that are not covered by more or less specific prefixes (valid or unsigned). Which mean rejecting these prefixes would make those subnets (with no overlap) unable to reach our network (more exactly would make them invisible to us).
To help with the decision on whether the security aspect is worth discarding those prefixes, I started talking to Analytics and opened T220639 to be able to cross-reference a list of RPKI unreachable (how to get it is still TBD) with our webrequests.
Similarly, once we have an infra wide Netflow, a recent version of pmacct allows to do the same (at lower layers).
Enforcing RPKI on peering links is also good low hanging fruit as any unreachable would route over transit links.