We now have a new QFX5120-48Y switch in codfw rack B1, which arrived with the replacement switches for the row A/B hardware refresh recently (see T312138). It is installed in the rack and reachable via the OpenGear console server.
The intention is to configure this switch, unlike asw-b1-codfw which is was purchased to replace, as a stand-alone "cloudsw", mirroring the dedicated switches for WMCS in eqiad.
Creating this task to track the steps to configure this device, and begin migrating cloud hosts from the existing one over to it.
At a high-level I would suggest we proceed as follows, open to discussion of course:
Add Physical Connections
- Connect the cloudsw to cr1-codfw or cr2-codfw for the routed uplink from the cloudsw to the core routers.
- I expect 10G is sufficient bandwidth for this link
- 1 connection is probably sufficient, this does mean lack of redundancy but codfw is WMCS test/staging site.
- Connect the cloudsw to asw-b1-codfw, as a vlan trunk port
- We only trunk Vlan 2118 - cloud-hosts1-codfw - over this link
- 10G probably sufficient BW. Could be 2x10G LAG if we thought needed?
Enable the routed CR -> Cloudsw logical links and BGP
Once the physical connections are in place we proceed like this to make the cloudsw <-> cr link live:
- Configure the cloudsw similarly to those in eqiad, using the same templates, vrf setup
- Configure the routed uplinks on the CR and cloudsw, and apply the labs-in and cloud-in filters to sub-ints
- Configure the cloudsw with a currently unused IP from the cloud-hosts1-codfw subnet
- Validate that the CR receives the BGP announcement of the cloud-hosts1-codfw subnet from the cloudsw
- It will still prefer it's direct connection to it on ae2.2118
Move cloud vlan gateway IPs from CRs to cloudsw
For cloud-hosts1-codfw subnet:
- Change GW IP on cloudsw irb.2118 interface to 10.192.20.1
- Shut down ae2.2118 on cr1-codfw and cr2-codfw
- This will halt traffic as hosts have cached MAC VRRP MAC in their ARP table
- We need to manually clear the ARP cache on servers connected to cloud-hosts1-codfw for the GW IP
- Validate things are working as before, all services etc., and traffic flowing via the cloudsw<-->cr link
TODO - Add section on moving Vlan 2120 (cloud-instance-transport1-b-codfw) to the cloudsw using similar process.
Begin physical host moves, and CloudLB POC
With the gateway for hosts on cloud Vlans now moved over to the new switch we can then begin to migrate host physical connections to the new it. We can also add the cloud-private Vlan as discussed here, which is needed to begin work on the CloudLB POC (see T324992).
Public Vlan
Vlan 2002 (public1-b-codfw) will not be trunked to the cloudsw as part of this move, so hosts connected to that (for instance cloudservice), should be left connected to asw-b1-codfw for now.
Ultimately the plan would be to validate the design for the CloudLB, and then migrate these hosts to that new model, moving them to the new switch in the process. But leaving them connected to the old switch for as long as they have to be on public1-b-codfw.