Long due follow up from T186835#4089227 (and paraphrasing most of it).
When investigating routing/peering in eqsin, I noticed that some prefixes were taking sub-optimal paths, for example:
126.96.36.199 - 119.0ms - 6279 * 9498 23752 23752 23752 23752 23752 23752 23752 23752 ? (local-pref 250) 3491 9498 23752 23752 23752 23752 23752 23752 23752 23752 ? (local-pref 250) 6453 23752 ? (local-pref 100) 188.8.131.52 - 186.0ms - 32819 * 3491 41095 59103 59105 59105 59105 59105 ? (local-pref 250) 6453 2518 59105 ? (local-pref 100)
The shorter AS path isn't the chosen one as it has a lower local-pref.
This is due to our current policy to prioritize peering over transit, and might not be relevant anymore:
- If the prefix is learned from a peer, its AS path will most often be shorter (because less middlemen)
- Prioritizing them override the destination network's traffic engineering policies, as we can see in the example above (we ignore the AS-prepending) and could hit bottleneck or sub-optimal routing
- Requires customs tuning to workaround those sub-optimal routing (when we notice them)
- cost savings of sending traffic through free links (vs. paid transit) are null (far from commit) as long as no massive change
- Increases the configuration and routing complexity (various rules and routing decisions)
I'm thinking that we should not prioritize peers (especially ones operating at a large geographical scope) over transit in term of local-pref (use the default value of 100), and not prioritize them (local pref 250 as of right now).
A test has been done previously in eqsin (see. T186835#4121297 ) and I'd like to do the same test on a larger scale, at least esams, at best globally.
The 3 aspects to monitor are:
- Link capacity, (eg. traffic shifts and saturates a transit link)
- Transits commits (billing)
- Performances (are we seeing any improvement or degradation)
The former has been verified and we're have plenty of capacity, the latter would require the help of the performance team.
The test would be successful if no performance degradation nor massive traffic shift occurs (none are expected).