Summary
This task will track the work to connect the new Nokia switches in eqiad row C to the existing Juniper devices in that row, and bridge the existing vlans on the old switches through to the Nokias. As we will re-use the ports currently connecting asw2-c-eqiad to our core routers for this (we have no others free), it will also involve moving the VRRP GW on the CRs from the current port (connected to asw2-c-eqiad) to et-1/0/5 (connected to Nokia Spines).
Traffic to end hosts should not be disrupted during the move, though it is a delicate operation and we need to move carefully and check at all times that things are ok.
We will follow the same process for all vlans. There are three vlans we need to consider:
1019 - private1-c-eqiad (vrrp)
1003 - public1-c-eqiad (vrrp)
1022 - analytics1-c-eqiad (vrrp)
The main things to watch when doing it are our alerts dashboard, eqiad throughput dashboard, and the #opterations and #sre channels on irc.
Phase 1 - Migrate asw2-c2-eqiad et-7/0/53 -> ssw1-d1-eqiad ethernet-1/28
Step 1 - Verify VRRP status on the CRs
The Netbox configuration for all three VRRP groups is set so that cr2-eqiad is primary in VRRP for all three subnets. But just to be sure we want to connect to both CRs and verify with
show vrrp summary | match ae3
Step 2 - Shut down et-1/1/0 on cr1-eqiad
Next we want to shut down the port on cr1-eqiad which connects to asw2-c2-eqiad. This will mean the 'direct' / 'connected' route to that vlan will disappear on cr1-eqiad, and instead it should install the route to those destinations it learns in OSPF from cr2-eqiad.
deactivate interface et-1/1/0
Once done we want to verify that routes to all destinations are still there, going to cr2:
show route terse table inet.0 exact 10.64.32.0/22 show route terse table inet.0 exact 208.80.154.64/26 show route terse table inet.0 exact 10.64.36.0/24
show route table inet6.0 exact 2620:0:861:3::/64 show route table inet6.0 exact 2620:0:861:103::/64 show route table inet6.0 exact 2620:0:861:106::/64
We also want to check graphs that connectivity seems ok, and do some traceroutes from hosts in row A (which has VRRP GW as cr1-eqiad so will use the routes shown from above commands). Some hosts in row A we can test from include:
wikikube-worker1240 db1151 pki1001 maps1005 cp1102 dns1004 idp1004 apt1002 gitlab1003 an-conf1004 an-worker1118
Step 3: adjust netbox connection for asw2-c2-eqiad et-2/0/53 and run homer
Now that the CR port connected to asw2-c2-eqiad et-2/0/53 is down we can reconfigure it. In Netbox we need to:
- Adjust the cable so it shows it connected to ssw1-d1-eqiad ethernet-1/28 instead
- Make it a member of ae0 instead of ae1
- Enable the ae0 interface
- Delete the ae1 interface
After which we can run homer against asw2-c2-eqiad to update the port's AE membership and description.
Step 4: re-cable asw2-c2-eqiad et-2/0/53 to ssw1-d1-eqiad ethernet1/28
Now that the CR port is disabled we can move the optic from the CR to the Nokia spine port, and re-terminate the fibre link on it.
Step 5: validate we see MAC addresses on the lag1 interface of ssw1-d1-eqiad
We should see MAC addresses learnt on the various vlans:
show network-instance vlan-1019 bridge-table mac-table all show network-instance vlan-1003 bridge-table mac-table all show network-instance vlan-1022 bridge-table mac-table all
We should then repeat these commands on ssw1-d8-eqiad, verifying the MAC addresses are being distributed in BGP EVPN within the Nokia cluster.
Step 6: Move cr1-eqiad ae3 sub-interfaces to et-1/0/5 in Netbox and run homer
At this point we can move the ae3.X sub-interfaces in netbox from the 'ae3 LAG to port et-1/0/5 (connected to ssw1-d1-eqiad). We should double check the VRRP group updates as expected when the interfaces are renamed.
When done we can run Homer against cr1-eqiad to enable the new sub-interfaces.
Step 7: verify L3 connectivity from row C hosts to cr1-eqiad
We should now be able to ping the various IPs configured on the moved sub-interfaces on cr1-eqiad. Some suggestions for hosts to test from are:
| Vlan | IPs to ping | Hosts to source pings |
|---|---|---|
| 1019 - private1-c-eqiad | 10.64.32.2 & 2620:0:861:103:fe00::1 | es1045, wikikube-worker1063, db1242 |
| 1003 - public1-c-eqiad | 208.80.154.66 & 2620:0:861:3:fe00::1 | dns1006, alert1002, lists1004 |
| 1022 - analytics1-c-eqiad | 10.64.36.2 & 2620:0:861:106:fe00::1 | an-conf1006, an-worker1131, stat1011 |
Step 8: Flip VRRP on the CRs so to make cr1-eqiad the active GW
At this point we have connectivity to both CR routers on all the vlans again. To cr2-eqiad as things were, directly from asw2-c-eqiad, and to cr1-eqiad from asw2-c-eqiad -> ssw1-d1-eqiad -> cr1-eqiad.
In this step we will change the VRRP priority for all three vlans so they take this new path via the Nokia spine switch. The VRRP groups below should be modified, changing the priority for cr1-eqiad to 200:
1019 - private1-c-eqiad (vrrp)
1003 - public1-c-eqiad (vrrp)
1022 - analytics1-c-eqiad (vrrp)
With it changed in Netbox we can run Homer against cr1-eqiad to promote it to master. Once in place we can validate on both CRs:
show vrrp summary | match "ae3|et-1/0/5"
Provided it is master we should can look at this graph to validate that the traffic has flipped from one device to the other, and it is the same order of magnitude as before.
We should check from the same hosts as in the last step that comms are ok to devices outside the current vlan (some are listed in step 2, and on the public vlan we can ping internet destinations).
Phase 2 - Migrate asw2-c7-eqiad et-7/0/49 -> ssw1-d8-eqiad ethernet-1/28
The status at this point is we have one of the links moved, and outbound traffic is flowing through the Nokia spine and out to cr1-eqiad. Next we need to move the other uplink from asw2-c-eqiad, effectively repeating the process for that link.
Step 1: Shut down et-1/1/0 on cr2-eqiad
As before we want to deactivate the interface, then confirm it still knows a route to the various subnets in OSPF from cr1:
deactivate interface et-1/1/0
show route terse table inet.0 exact 10.64.32.0/22 show route terse table inet.0 exact 208.80.154.64/26 show route terse table inet.0 exact 10.64.36.0/24
show route table inet6.0 exact 2620:0:861:3::/64 show route table inet6.0 exact 2620:0:861:103::/64 show route table inet6.0 exact 2620:0:861:106::/64
We should check from a variety of hosts in row D (which use cr2-eqiad as VRRP master) that they can reach hosts in row C, example hosts to source pings are:
es1052 restbase1042 wikikube-worker1034 aqs1019 wikikube-worker1163
Step 3: adjust netbox connection for asw2-c7-eqiad et-2/0/49 and run homer
Now that the CR port connected to asw2-c2-eqiad et-2/0/53 is down we can reconfigure it. In Netbox we need to:
- Adjust the cable so it shows it connected to ssw1-d8-eqiad ethernet-1/28 instead
- Make it a member of ae0 instead of ae2
- Enable the ae0 interface
- Delete the ae2 interface
After which we can run homer against asw2-c2-eqiad to update the port's AE membership and description.
Step 4: re-cable asw2-c2-eqiad et-2/0/49 to ssw1-d8-eqiad ethernet1/28
Now that the CR port is disabled we can move the optic from the CR to the Nokia spine port, and re-terminate the fibre link on it.
Step 5: validate we see MAC addresses on the lag1 interface of ssw1-d8-eqiad
Firstly verify the LAG looks healthy on both Nokia spines:
show system network-instance ethernet-segments LAG1
We should see MAC addresses learnt on the various vlans:
show network-instance vlan-1019 bridge-table mac-table all show network-instance vlan-1003 bridge-table mac-table all show network-instance vlan-1022 bridge-table mac-table all
Check that the ESI type routes and MAC addresses (type 2) learnt on the LAG port on ssw1-d8-eqiad are being announced in BGP and received on ssw1-d1-eqiad:
show network-instance default protocols bgp routes evpn route-type 1 summary show network-instance default protocols bgp routes evpn route-type 4 summary show network-instance default protocols bgp routes evpn route-type 2 summary
Step 6: Move cr2-eqiad ae3 sub-interfaces to et-1/0/5 in Netbox and run homer
At this point we can move the ae3.X sub-interfaces in netbox from the 'ae3 LAG to port et-1/0/5 (connected to ssw1-d1-eqiad). We should double check the VRRP group updates as expected when the interfaces are renamed.
When done we can run Homer against cr1-eqiad to enable the new sub-interfaces.
Step 7: verify L3 connectivity from row C hosts to cr2-eqiad
We should now be able to ping the various IPs configured on the moved sub-interfaces on cr1-eqiad. Some suggestions for hosts to test from are:
| Vlan | IPs to ping | Hosts to source pings |
|---|---|---|
| 1019 - private1-c-eqiad | 10.64.32.3 & 2620:0:861:103:fe00::2 | es1045, wikikube-worker1063, db1242 |
| 1003 - public1-c-eqiad | 208.80.154.67 & 2620:0:861:3:fe00::2 | dns1006, alert1002, lists1004 |
| 1022 - analytics1-c-eqiad | 10.64.36.3 & 2620:0:861:106:fe00::2 | an-conf1006, an-worker1131, stat1011 |
Step 7: Flip VRRP on the CRs so to make cr2-eqiad the active GW again
At this point we have connectivity to both CR routers on all the vlans again. To cr2-eqiad as things were, directly from asw2-c-eqiad, and to cr1-eqiad from asw2-c-eqiad -> ssw1-d1-eqiad -> cr1-eqiad.
In this step we will change the VRRP priority for all three vlans so they take this new path via the Nokia spine switch. The VRRP groups below should be modified, changing the priority for cr1-eqiad back to 90:
1019 - private1-c-eqiad (vrrp)
1003 - public1-c-eqiad (vrrp)
1022 - analytics1-c-eqiad (vrrp)
With it changed in Netbox we can run Homer against cr2-eqiad to promote it to master. Once in place we can validate on both CRs:
show vrrp summary | match "et-1/0/5"
Provided it is master we can look at this graph to validate that the traffic has flipped from one device to the other, and it is the same order of magnitude as before.
We should check from the same hosts as in the last step that comms are ok to devices outside the current vlan (some are listed in step 2, and on the public vlan we can ping internet destinations).
Phase 3 - Cleanup
- Delete ae3 and sub-interfaces from cr1-eqiad and disable port et-1/1/0
- Delete ae3 and sub-interfaces from cr2-eqiad and disable port et-1/1/0