Page MenuHomePhabricator

Test using trunked interfaces for cloudvirts
Closed, ResolvedPublic

Description

In order to fit everything into available 10G ports to do the Ceph buildout and rack cloudvirt replaces in tasks (T242133 and T243471), we would like to test using vlan tagging to split traffic for control and hypervisor use over a single 10G interface instead of splitting that traffic by using two interfaces (and therefore 2 ports on the switches).

This task is to try this out in CODFW as a proof of concept.

Today the cloudvirts are configured with 2 network interfaces. These interfaces have the following configuration in codfw.

Host portConfig typeVLANTraffic type
eno1access port2118 (cloud-hosts1-b-codfw)server management and monitoring, disk image migrations
eno2trunk port2105 (cloud-instances2-b-codfw)Virtual machine instance traffic

In order to reduce the amount of 10Gb ports and better utilize the existing infrastructure we'd like to consolidate these 2 interfaces into 1.

Initial testing requests (we're totally open to discuss/change this!)

  • physical switch side, we'd like to reconfigure eno1 to a trunk port with VLAN 2118 as default (untagged) and VLAN 2105 in the member list.
  • host side we'll leverage the already existing 8021q module and configuration to create a new tagged interface eno1.2105. Once that is complete, this new interface will be mapped in neutron (on the local host) to physical_interface_mappings = cloudinstances2b:eno1.2105

Event Timeline

Bstorm triaged this task as Medium priority.Mar 24 2020, 9:36 PM
Bstorm created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2020, 9:36 PM
JHedden renamed this task from Test the use of trunked interfaces for cloudvirts to Test the using trunked interfaces for cloudvirts.Mar 24 2020, 10:24 PM
JHedden renamed this task from Test the using trunked interfaces for cloudvirts to Test using trunked interfaces for cloudvirts.
JHedden updated the task description. (Show Details)

Let me know when you want the switch ports to be re-configured.

If done in two steps (first add 2105 to eno1, then disable eno2), I will need your confirmation than there is no risk of the two vlans/interfaces to be bridged so they don't cause any forwarding loop.

Let me know when you want the switch ports to be re-configured.

We're ready to have the ports for cloudvirt2001-dev.codfw.wmnet reconfigured anytime.

If done in two steps (first add 2105 to eno1, then disable eno2), I will need your confirmation than there is no risk of the two vlans/interfaces to be bridged so they don't cause any forwarding loop.

I can confirm there's nothing in the configuration that I'm doing that will create a bridge between these 2 networks.

[edit interfaces interface-range vlan-private1-b-codfw]
-    member ge-3/0/23;
[edit interfaces interface-range vlan-cloud-hosts1-b-codfw]
-    member ge-3/0/22;
[edit interfaces interface-range cloud-instance2-ports]
-    member ge-3/0/23;
[edit interfaces interface-range disabled]
     member xe-7/0/5 { ... }
+    member ge-3/0/23;
[edit interfaces]
    interface-range cloud-net-trunk { ... }
+   interface-range cloudvirt-trunk {
+       member ge-3/0/22;
+       native-vlan-id 2118;
+       mtu 9192;
+       unit 0 {
+           family ethernet-switching {
+               interface-mode trunk;
+               vlan {
+                   members [ cloud-instances2-b-codfw cloud-hosts1-b-codfw ];
+               }
+           }
+       }
+   }

Mentioned in SAL (#wikimedia-operations) [2020-04-02T14:56:52Z] <XioNoX> push new test switch config for cloudvirt2001 - T248425

Could you double check the interface has access to cloud-instances2-b-codfw? I'm not able to communicate on VLAN 2105 using the eno1 interface.

cloudvirt2001-dev
$ brctl show
bridge name     bridge id               STP enabled     interfaces
brq05a5494a-18          8000.d09466939777       no              eno1.2105
                                                        tap13742842-1a
                                                        tap79339fe5-ef
                                                        tap9969edcf-fd

$ ip l show dev eno1.2105
57: eno1.2105@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brq05a5494a-18 state UP mode DEFAULT group default qlen 1000
    link/ether d0:94:66:93:97:77 brd ff:ff:ff:ff:ff:ff
VM on cloudvirt2001-dev
$  ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP group default qlen 1000
    inet 172.16.128.14/24 scope global eth0
       valid_lft forever preferred_lft forever

$ ping 172.16.128.1
PING 172.16.128.1 (172.16.128.1) 56(84) bytes of data.
From 172.16.128.14 icmp_seq=1 Destination Host Unreachable
From 172.16.128.14 icmp_seq=2 Destination Host Unreachable
From 172.16.128.14 icmp_seq=3 Destination Host Unreachable

I can see packets from the virtual machine on the bridge and correct device, but nothing from upstream

$ tcpdump -i eno1.2105
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eno1.2105, link-type EN10MB (Ethernet), capture size 262144 bytes
16:48:24.513940 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:f9:0b:cc (oui Unknown), length 302
16:48:27.293782 IP6 fe80::f816:3eff:fe81:56dc > ip6-allrouters: ICMP6, router solicitation, length 16
16:48:31.834590 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:f9:0b:cc (oui Unknown), length 302
16:48:46.347533 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:f9:0b:cc (oui Unknown), length 302
16:49:06.955946 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:f9:0b:cc (oui Unknown), length 302
16:49:11.447649 IP6 fe80::f816:3eff:fef0:334a > ip6-allrouters: ICMP6, router solicitation, length 16
16:49:19.466121 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:f9:0b:cc (oui Unknown), length 302
16:49:31.299519 ARP, Request who-has 172.16.128.1 tell 172.16.128.14, length 28
16:49:32.322075 ARP, Request who-has 172.16.128.1 tell 172.16.128.14, length 28
16:49:33.346226 ARP, Request who-has 172.16.128.1 tell 172.16.128.14, length 28
16:49:34.376011 ARP, Request who-has 172.16.128.1 tell 172.16.128.14, length 28
16:49:35.394212 ARP, Request who-has 172.16.128.1 tell 172.16.128.14, length 28
16:49:36.418280 ARP, Request who-has 172.16.128.1 tell 172.16.128.14, length 28

Scratch that ^, I was able to verify I can see the traffic on the other hypervisors over the 2105 VLAN.

This updated interface is working as expected, thanks!

@aborrero I think I'm running into some of the VXLAN configuration changes you're testing.

cloudnet2002-dev
# brctl show                                                                                                                                                                                  
bridge name     bridge id               STP enabled     interfaces                                                                                                                                                                  
br-external             8000.30e17155a241       no              eno2.2120                                                                                                                                                           
                                                        tap1290224c-b1                                                                                                                                                              
br-internal             8000.26e102df76f5       no              tap21e10025-d4                                                                                                                                                      
                                                        tapc7e2dc7c-96                                                                                                                                                              
brqd967e056-ef          8000.328e3eca04ab       no              tap5dc9c3b7-24                                                                                                                                                      
                                                        vxlan-1

$ ip l show dev eno2.2105
11: eno2.2105@eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 30:e1:71:55:a2:41 brd ff:ff:ff:ff:ff:ff

Note that eno2.2105 is being used by OVS, so the br-internal bridge cannot use the cloud-instances2-b-codfw interface. This blocks the VMs from communicating with the router (no gateway, dhcp, or metadata).

Changes made on cloudnet2002-dev to enable basic networking using our existing configuration

Remove port from OVS
cloudnet2002-dev:~# ovs-vsctl list-ports br-provider
eno2.2105
phy-br-provider
cloudnet2002-dev:~# ovs-vsctl del-port br-provider eno2.2105
cloudnet2002-dev:~# ovs-vsctl list-ports br-provider
phy-br-provider
cloudnet2002-dev:~# systemctl stop neutron-openvswitch-agent.service
cloudnet2002-dev:~# systemctl stop openvswitch-switch.service
Add port to linux bridge
cloudnet2002-dev:~# brctl show br-internal
bridge name     bridge id               STP enabled     interfaces
br-internal     8000.26e102df76f5       no              tap21e10025-d4
                                                        tapc7e2dc7c-96
cloudnet2002-dev:~# brctl addif br-internal eno2.2105
cloudnet2002-dev:~# brctl show br-internal
bridge name     bridge id               STP enabled     interfaces
br-internal     8000.1e5401e129f0       no              eno2.2105
                                                        tap072f796b-94
                                                        tap21e10025-d4
cloudnet2002-dev:~# systemctl restart neutron-l3-agent
cloudnet2002-dev:~# systemctl restart neutron-dhcp-agent

When I cleaned up the vxlan setup I forgot about the bridges, sorry about that and glad you figured it out!

Good to see the port trunking actually works!

Change 585757 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: update cloudvirt2001-dev flat interface

https://gerrit.wikimedia.org/r/585757

Change 585757 merged by Jhedden:
[operations/puppet@production] openstack: update cloudvirt2001-dev flat interface

https://gerrit.wikimedia.org/r/585757

JHedden closed this task as Resolved.Apr 13 2020, 3:29 PM
JHedden claimed this task.

Mentioned in SAL (#wikimedia-cloud) [2020-05-12T19:09:06Z] <jeh> Shutdown the unused eno2 network interface on cloudvirt2001-dev.codfw to clear up monitoring errors T248425