Our LVS servers announce /32 and /128 IPs for services via BGP to our core routers, using PyBal.
At this point this is a well understood and robust setup. However on our core routers, for instance in the below case in Eqiad, still have static-routes for IPs for the aggregate blocks we use for LVS services:
cmooney@re0.cr1-eqiad> show configuration routing-options static | display set | match "10.64.1.13|10.64.17.14|10.64.33.15" set routing-options static route 208.80.154.224/28 next-hop 10.64.1.13 set routing-options static route 208.80.154.240/28 next-hop 10.64.17.14 set routing-options static route 10.2.2.0/24 next-hop 10.64.33.15
cmooney@re0.cr1-eqiad> show configuration routing-options rib inet6.0 static | display set | match "10.64.1.13|10.64.17.14|10.64.33.15" set routing-options rib inet6.0 static route 2620:0:861:ed1a::0:0/111 next-hop 2620:0:861:101:10:64:1:13 set routing-options rib inet6.0 static route 2620:0:861:ed1a::2:0/111 next-hop 2620:0:861:102:10:64:17:14
To my knowledge this has existed since PyBal was initially deployed, and is intended to act as a "backup" route should BGP die on all the available LVS machines.
Since that time we have had a lot of experience with PyBal and know it to be robust and the overall LVS setup to work well. With that in mind there is an open question as to whether these static routes are needed at all.
Myself and @ayounsi spoke about it and are of the opinion they can be safely removed. We see that as a good move because:
- Their presence in the config adds unnecessary complication.
- They have made automating the static route configuration difficult.
- There is a risk as we move / migrate LVS servers these are not updated, and some unexpected edge-case occurrs.
- There is no compelling story about what type of scenario these protect against, it does not seem like they have ever "saved" us in an incident.
@BBlack interested to hear your thoughts on this (or anyone else who may feel they are a good idea to keep). Thanks!
- eqiad:
- codfw: removed
- esams: nothing to do
- ulsfo: removed
- eqsin: removed
- drmrs: nothing to do
- magru: nothing to do