Page MenuHomePhabricator

openstack: instrument VXLAN-based flat network
Closed, ResolvedPublic

Description

This ticket is track the work to instrument the VXLAN-based flat networks as designed in T373869: Cloud VPS: design target vxlan setup.

Also, do at the same time:

  • We need to double check that the VXLAN tunnels are actually circulating using cloud-private.
  • We need to double check the MTU settings on the affected interfaces and subnets.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
codfw1dev: rename cloud-flat to cloud-flat-codfw1devrepos/cloud/cloud-vps/tofu-infra!37aborreroarturo-320-codfw1dev-rename-clmain
ports: remove device owner datarepos/cloud/cloud-vps/tofu-infra!36aborreroarturo-920-ports-remove-devicemain
codfw1dev: temporary removal of cloud-flat router interfacerepos/cloud/cloud-vps/tofu-infra!35aborreroarturo-215-codfw1dev-temporarymain
imports: add import for renamed neutron router portrepos/cloud/cloud-vps/tofu-infra!33aborreroarturo-297-imports-add-importmain
router_interfaces: account for subnetid and portid being mutually exclusiverepos/cloud/cloud-vps/tofu-infra!32aborreroarturo-115-router_interfaces-amain
imports: drop network-related importsrepos/cloud/cloud-vps/tofu-infra!31aborreroarturo-862-imports-drop-networmain
codfw1dev: instrument VXLAN-based flat networkrepos/cloud/cloud-vps/tofu-infra!30aborreroarturo-249-codfw1dev-instrumenmain
Customize query in GitLab

Event Timeline

Restricted Application removed a subscriber: taavi. · View Herald TranscriptSep 4 2024, 3:13 PM
aborrero renamed this task from openstack: double check VXLAN-based flat network implementation to openstack: instrument VXLAN-based flat network.Sep 4 2024, 3:15 PM
aborrero updated the task description. (Show Details)
aborrero changed the task status from Open to In Progress.Sep 4 2024, 3:52 PM
aborrero moved this task from Next to Doing on the User-aborrero board.

aborrero opened https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32

router_interfaces: account for subnetid and portid being mutually exclusive

aborrero merged https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/32

router_interfaces: account for subnetid and portid being mutually exclusive

With the above patches, I was able to create a VM attached to the new VXLAN-based subnet:

aborrero@cloudcontrol2004-dev:~$ sudo wmcs-openstack --os-project-id cloudinfra-codfw1dev server create --flavor g4.cores1.ram1.disk20 --image debian-12.0-bookworm --network cloud-flat arturo-test-vm
+--------------------------------------+--------------------------------------------------------------+
| Field                                | Value                                                        |
+--------------------------------------+--------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                       |
| OS-EXT-AZ:availability_zone          |                                                              |
| OS-EXT-SRV-ATTR:host                 | None                                                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                                         |
| OS-EXT-SRV-ATTR:instance_name        |                                                              |
| OS-EXT-STS:power_state               | NOSTATE                                                      |
| OS-EXT-STS:task_state                | scheduling                                                   |
| OS-EXT-STS:vm_state                  | building                                                     |
| OS-SRV-USG:launched_at               | None                                                         |
| OS-SRV-USG:terminated_at             | None                                                         |
| accessIPv4                           |                                                              |
| accessIPv6                           |                                                              |
| addresses                            |                                                              |
| adminPass                            | p9XrCjS7bH88                                                 |
| config_drive                         |                                                              |
| created                              | 2024-09-05T12:30:24Z                                         |
| flavor                               | g4.cores1.ram1.disk20 (b1c8399c-87da-4262-86af-dfb6552e550e) |
| hostId                               |                                                              |
| id                                   | bcc9bcdd-ee0b-4a83-b982-bea119e499cd                         |
| image                                | debian-12.0-bookworm (9ea158ee-ca2b-41ea-9100-0c85b2c26466)  |
| key_name                             | None                                                         |
| name                                 | arturo-test-vm                                               |
| os-extended-volumes:volumes_attached | []                                                           |
| progress                             | 0                                                            |
| project_id                           | cloudinfra-codfw1dev                                         |
| properties                           |                                                              |
| security_groups                      | name='default'                                               |
| status                               | BUILD                                                        |
| updated                              | 2024-09-05T12:30:25Z                                         |
| user_id                              | novaadmin                                                    |
+--------------------------------------+--------------------------------------------------------------+
aborrero@cloudcontrol2004-dev:~$ sudo wmcs-openstack server show bcc9bcdd-ee0b-4a83-b982-bea119e499cd
+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                               |
+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                              |
| OS-EXT-AZ:availability_zone         | nova                                                                                                |
| OS-EXT-SRV-ATTR:host                | cloudvirt2005-dev                                                                                   |
| OS-EXT-SRV-ATTR:hostname            | arturo-test-vm                                                                                      |
| OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt2005-dev.codfw.wmnet                                                                       |
| OS-EXT-SRV-ATTR:instance_name       | i-000389e0                                                                                          |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                                     |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                                   |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                                     |
| OS-EXT-SRV-ATTR:reservation_id      | r-mmpnnf21                                                                                          |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/sda                                                                                            |
| OS-EXT-SRV-ATTR:user_data           | None                                                                                                |
| OS-EXT-STS:power_state              | Running                                                                                             |
| OS-EXT-STS:task_state               | None                                                                                                |
| OS-EXT-STS:vm_state                 | active                                                                                              |
| OS-SRV-USG:launched_at              | 2024-09-05T12:30:32.000000                                                                          |
| OS-SRV-USG:terminated_at            | None                                                                                                |
| accessIPv4                          |                                                                                                     |
| accessIPv6                          |                                                                                                     |
| addresses                           | cloud-flat=172.16.129.103                                                                           |
| config_drive                        |                                                                                                     |
| created                             | 2024-09-05T12:30:24Z                                                                                |
| description                         | arturo-test-vm                                                                                      |
| flavor                              | description=, disk='20', ephemeral='0', extra_specs.aggregate_instance_extra_specs:ceph='true',     |
|                                     | extra_specs.aggregate_instance_extra_specs:network-agent='ovs',                                     |
|                                     | extra_specs.quota:disk_read_iops_sec='5000', extra_specs.quota:disk_total_bytes_sec='200000000',    |
|                                     | extra_specs.quota:disk_write_iops_sec='500', id='g4.cores1.ram1.disk20', is_disabled=,              |
|                                     | is_public='True', location=, name='g4.cores1.ram1.disk20', original_name='g4.cores1.ram1.disk20',   |
|                                     | ram='1024', rxtx_factor=, swap='0', vcpus='1'                                                       |
| hostId                              | c34125d4d28d51f555659a9fe9551487b3d395dd6c34d091dbfbfe16                                            |
| host_status                         | UP                                                                                                  |
| id                                  | bcc9bcdd-ee0b-4a83-b982-bea119e499cd                                                                |
| image                               | debian-12.0-bookworm (9ea158ee-ca2b-41ea-9100-0c85b2c26466)                                         |
| key_name                            | None                                                                                                |
| locked                              | False                                                                                               |
| locked_reason                       | None                                                                                                |
| name                                | arturo-test-vm                                                                                      |
| progress                            | 0                                                                                                   |
| project_id                          | cloudinfra-codfw1dev                                                                                |
| properties                          |                                                                                                     |
| security_groups                     | name='default'                                                                                      |
| server_groups                       | []                                                                                                  |
| status                              | ACTIVE                                                                                              |
| tags                                |                                                                                                     |
| trusted_image_certificates          | None                                                                                                |
| updated                             | 2024-09-05T12:30:32Z                                                                                |
| user_id                             | novaadmin                                                                                           |
| volumes_attached                    |                                                                                                     |
+-------------------------------------+-----------------------------------------------------------------------------------------------------+

As of this writing the VM doesn't have connectivity, most likely because firewalling or whatnot. It is also not reachable via SSH, so only can be accessed via hypervisor console, like this:

user@laptop:~$ ssh cloudvirt2005-dev.codfw.wmnet
aborrero@cloudvirt2005-dev:~$ sudo virsh console i-000389e0
root@arturo-test-vm:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens3             UP             172.16.129.103/24 metric 100 fe80::f816:3eff:feee:eb12/64 

Some ping tests:

root@arturo-test-vm:~# ping -c1 172.16.129.1
PING 172.16.129.1 (172.16.129.1) 56(84) bytes of data.
64 bytes from 172.16.129.1: icmp_seq=1 ttl=64 time=1.11 ms

--- 172.16.129.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.111/1.111/1.111/0.000 ms

root@arturo-test-vm:~# ping -c1 172.16.128.58
PING 172.16.128.58 (172.16.128.58) 56(84) bytes of data.
64 bytes from 172.16.128.58: icmp_seq=1 ttl=63 time=1.89 ms

--- 172.16.128.58 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.889/1.889/1.889/0.000 ms

aborrero@bastion-codfw1dev-03:~$ ping -c1 172.16.129.103
PING 172.16.129.103 (172.16.129.103) 56(84) bytes of data.
64 bytes from 172.16.129.103: icmp_seq=1 ttl=63 time=3.60 ms

--- 172.16.129.103 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 3.601/3.601/3.601/0.000 ms

Neutron router:

aborrero@cloudnet2006-dev:~$ sudo ip netns exec qrouter-5712e22e-134a-40d3-a75a-1c9b441717ad bash
root@cloudnet2006-dev:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ha-245fa166-c9   UNKNOWN        169.254.193.35/18 169.254.0.1/24 
qr-21e10025-d4   UNKNOWN        172.16.128.1/24 fe80::f816:3eff:fe3c:1101/64 
qg-1290224c-b1   UNKNOWN        185.15.57.10/29 185.15.57.2/32 185.15.57.21/32 185.15.57.4/32 185.15.57.5/32 185.15.57.6/32 fe80::f816:3eff:fe35:9f97/64 
qr-db4d1c30-20   UNKNOWN        172.16.129.1/24 fe80::f816:3eff:fe4a:f6e2/64 
root@cloudnet2006-dev:~# tcpdump -i qr-db4d1c30-20 icmp
[..]
12:34:13.144857 IP 172.16.128.58 > 172.16.129.103: ICMP echo request, id 63451, seq 14, length 64
12:34:13.145640 IP 172.16.129.103 > 172.16.128.58: ICMP echo reply, id 63451, seq 14, length 64
[..]

nice work! I'll log on when I have some time to familiarise myself.

I deleted the previous VM and created 2 new ones, in different hypervisors:

  • arturo-test-vm -- 469d4e3a-f222-45ab-a442-3d84ec7043a9 -- 172.16.129.232

aborrero@cloudvirt2006-dev:~ $ sudo virsh console i-00038a07

  • arturo-test-vm2 -- 777e9409-514b-4e96-ae48-dcc9a5bf5348 -- 172.16.129.96

aborrero@cloudvirt2005-dev:~ $ sudo virsh console i-00038a13

Change #1071189 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw: introduce support for multiple flat networks

https://gerrit.wikimedia.org/r/1071189

Change #1071230 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] keystone: hooks: create security group rule for additional instance CIDRs

https://gerrit.wikimedia.org/r/1071230

Change #1071189 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw: introduce support for multiple flat networks

https://gerrit.wikimedia.org/r/1071189

Change #1071230 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] keystone: hooks: create security group rule for additional instance CIDRs

https://gerrit.wikimedia.org/r/1071230

Change #1072513 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: keystone: eqiad1: fix instances_ip_ranges parameter

https://gerrit.wikimedia.org/r/1072513

Change #1072513 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: keystone: eqiad1: fix instances_ip_ranges parameter

https://gerrit.wikimedia.org/r/1072513

Change #1072538 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] codfw1dev: enable new vxlan-based subnet CIDR in cloudgw and keystone

https://gerrit.wikimedia.org/r/1072538

Change #1072538 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] codfw1dev: enable new vxlan-based subnet CIDR in cloudgw and keystone

https://gerrit.wikimedia.org/r/1072538

The network is in better shape now, VMs have now connectivity by default.

I have recreated the VMs. To test:

  • ssh arturo-test-vm3.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud (dfc59caa-3300-4dae-9f41-86e23e44caeb)
  • ssh arturo-test-vm4.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud (aad43ff9-3be9-4c70-a92c-976f0dc8a31e)

They can also be accessed via the hypervisor

Good stuff!

Looking at the VMs and the setup things seem to be working well. I'll need to dig more into the OVS stuff on the Linux side to familiarise myself with the internal traffic path on the hypervisor but things look good.

VXLAN encap on the wire looks as expected - example ping from one VM to the other here.

OpenStack seems to be taking into account the hypervisor<->hypervisor max MTU of 1500 bytes, and the overhead required for the VXLAN wire encapsulation, and is setting the VM ethernet interfaces to a lower MTU of 1450 as a result:

root@arturo-test-vm4:~# ip link show dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP mode DEFAULT group default qlen 1000

This is really good, as it means the VMs will not try to send a bigger packet than can be transmitted across the network encapsulated with VXLAN. Testing this with ping all seems to work well, and IP fragmentation is kicking in when it needs (see here).

Downloads from the internet work fine with the lower MTU. The VMs will calculate the MSS they send during the TCP handshake to what will fit within their MTU, and servers on the internet will thus not send any packets larger than 1450. The only remaining place you may get problems is with large packets being sent using UDP or other non-TCP protocols, from these VMs to destinations where ICMP gets filtered (and thus Path MTU discovery will fail). These really should be vanishingly rare in todays internet (non existant almost).

So I think we are well set up here! Nice work <3