Page MenuHomePhabricator

Migrate IP gateway for private1-b-codfw to spine switches
Closed, ResolvedPublic

Description

To progress the wider migration from the old row-wide ASWs in codfw to the new EVPN-based devices, we need to move the IP gateway for existing vlans from CR routers to Spine switches.

Unfortunately, while interruption can be minimised, experience from moving the public vlans has shown that minor interruption to comms for some servers is to be expected. Hosts will not experience an interruption on IPv6, but it's likely that some will have an interruption on IPv4 traffic of between 10-60 seconds.

Codfw is currently our primary site, but we can't wait until we switch back to complete the change. It probably does make sense to depool the site in DNS for font-end connections, however.

UPDATE: In the end we waiting until after the DC switchover to complete this, as it was deemed to risky. It was completed without disruption, however, see below wikitech page describing the approach taken:

https://wikitech.wikimedia.org/wiki/Migrate_from_VC_switch_stack_to_EVPN

Event Timeline

cmooney renamed this task from Migrate IP gateway for public1-b-codfw to spine switches to Migrate IP gateway for private1-b-codfw to spine switches.Nov 23 2023, 2:35 PM

Going to delay this for now. We have enough disruptive changes planned not to burden wider SRE with this one in the next few weeks.

We do have some SPINE->LEAF->SPINE traffic right now which is *not* good, however it's all on 100G links via empty or almost-empty LEAF devices. As we move servers from asw to lsw the traffic pattern disappears also, as the LEAF will select the correct SPINE (connected to VRRP active CR for the vlan).

At that point the only downside is traffic going LEAF->SPINE->CR>SPINE->LEAF for intra-subnet traffic (similar to it's always been on ASW with CR as gateway). Moving the gateway to anycast on the LEAF devices would turn this to LEAF->SPINE->LEAF or turn-around within a leaf if local. So much better. But we can review once servers are moved and based on how long moving hosts from row-wide to rack-specific vlans is taking whether it's worth it. If we manage to re-ip some of the more "fragile" hosts the gateway move might be easier to execute given the interruption will only be short.

Mentioned in SAL (#wikimedia-operations) [2024-03-21T18:54:59Z] <topranks> removing IPv6 VRRP config on codfw core routers for vlan 2018 private1-b-codfw T351534

Mentioned in SAL (#wikimedia-operations) [2024-03-21T19:09:56Z] <topranks> adding routes to codfw row b hosts towards spine switch IPs on private1-b-codfw T351534

Mentioned in SAL (#wikimedia-operations) [2024-03-21T19:17:44Z] <topranks> remove VRRP GW IP for vlan 2018 from codfw core routers and add to EVPN switches irb.2018 interface T351534

Mentioned in SAL (#wikimedia-operations) [2024-03-21T20:14:15Z] <topranks> deleting irb.2018 interfaces from codfw spine switches T351534

cmooney updated the task description. (Show Details)