Page MenuHomePhabricator

openstack: initial IPv6 support in neutron
Closed, ResolvedPublic

Description

This ticket is to track the work to add initial IPv6 support in neutron.

Per T374712: netbox: create IPv6 entries for Cloud VPS in codfw1dev we will use the CIDR 2a02:ec80:a100::/56

docs: https://docs.openstack.org/neutron/latest/admin/config-ipv6.html

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
codfw1dev: fix external IPv6 router settingrepos/cloud/cloud-vps/tofu-infra!90aborreroarturo-182-codfw1dev-fix-extermain
codfw1dev: add external IPv6 to main routerrepos/cloud/cloud-vps/tofu-infra!89aborreroarturo-126-codfw1dev-add-extermain
codfw1dev: router gw IPv6 port: fix addressrepos/cloud/cloud-vps/tofu-infra!80aborreroarturo-240-codfw1dev-router-gwmain
codfw1dev: refresh cloud-flat IPv6 definitionrepos/cloud/cloud-vps/tofu-infra!79aborreroarturo-287-codfw1dev-refresh-cmain
Revert "codfw1dev: enable slaac mode for IPv6 network"repos/cloud/cloud-vps/tofu-infra!68aborreroarturo-201-revert-codfw1dev-enmain
codfw1dev: enable slaac mode for IPv6 networkrepos/cloud/cloud-vps/tofu-infra!67aborreroarturo-246-codfw1dev-enable-slmain
subnets: cloud-flat-codfw1dev-v6: summarize addressesrepos/cloud/cloud-vps/tofu-infra!66aborreroarturo-136-subnets-cloud-flatmain
codfw1dev: default secgroup: allow egress IPv6repos/cloud/cloud-vps/tofu-infra!65aborreroarturo-163-codfw1dev-default-smain
codfw1dev: subnet: specify ip_version as 6repos/cloud/cloud-vps/tofu-infra!64aborreroarturo-259-codfw1dev-subnet-spmain
codfw1dev: enable IPv6 in the vxlan flat networkrepos/cloud/cloud-vps/tofu-infra!63aborreroarturo-186-codfw1dev-enable-ipmain
Customize query in GitLab

Event Timeline

Restricted Application removed a subscriber: taavi. · View Herald TranscriptSep 27 2024, 8:20 AM
aborrero changed the task status from Open to In Progress.Sep 27 2024, 8:21 AM
aborrero triaged this task as Medium priority.
aborrero moved this task from Backlog to Doing on the User-aborrero board.
aborrero updated the task description. (Show Details)

new instance creation will allocate an IPv6 by default for a VM:

image.png (189×1 px, 45 KB)

Mentioned in SAL (#wikimedia-cloud) [2024-09-27T10:04:30Z] <arturo> [codfw1dev] enable IPv6 on the neutron virtual router T375847

however, instance creation itself failed:

2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [None req-5a10ab78-9313-4979-944a-02f50b6aa7b1 aborrero cloudinfra-codfw1dev - - default default] [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] Traceback (most recent call last):
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8006, in _create_guest_with_network
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     with self.virtapi.wait_for_instance_event(
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3.11/contextlib.py", line 144, in __exit__
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     next(self.gen)
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 559, in wait_for_instance_event
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     self._wait_for_instance_events(
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 471, in _wait_for_instance_events
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     actual_event = event.wait()
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]                    ^^^^^^^^^^^^
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 436, in wait
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     instance_event = self.event.wait()
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]                      ^^^^^^^^^^^^^^^^^
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/eventlet/event.py", line 124, in wait
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     result = hub.switch()
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]              ^^^^^^^^^^^^
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 310, in switch
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     return self.greenlet.switch()
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]            ^^^^^^^^^^^^^^^^^^^^^^
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] eventlet.timeout.Timeout: 300 seconds
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] 
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] During handling of the above exception, another exception occurred:
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] 
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] Traceback (most recent call last):
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2632, in _build_and_run_instance
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     self.driver.spawn(context, instance, image_meta,
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4646, in spawn
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     self._create_guest_with_network(
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]   File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8032, in _create_guest_with_network
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77]     raise exception.VirtualInterfaceCreateException()
2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [instance: c24c4c07-45cf-451f-b086-cdfddf3bfb77] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

neutron virtual router has the right IPv6 address:

image.png (153×925 px, 41 KB)

The VM did not get the IPv6 assigned in the interface via dhcpv6 :-(

aborrero@ipv6:~$ ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens3             UP             172.16.129.50/24 metric 100 fe80::f816:3eff:fe3e:4b38/64 

We got DNS integration half working:

aborrero@ipv6:~$ host ipv6.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud
ipv6.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud has address 172.16.129.50
ipv6.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud has IPv6 address 2a02:ec80:a100::1:320
aborrero@ipv6:~$ host 2a02:ec80:a100::1:320
Host 0.2.3.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa not found: 3(NXDOMAIN)

I see the dhcp6 packets from my test VM arriving into neutron:

11:20:29.156995 IP6 fe80::f816:3eff:fe3e:4b38.546 > ff02::1:2.547: dhcp6 solicit

But nobody replies?

I see the dhcp6 packets from my test VM arriving into neutron:

11:20:29.156995 IP6 fe80::f816:3eff:fe3e:4b38.546 > ff02::1:2.547: dhcp6 solicit

But nobody replies?

tcpdump captures packets before they hit netfilter rules, so make sure the local system is allowing these packets in nftables etc.

We can discuss in more length, but without looking at it deeply DHCPv6 seems like the way to go for address assignment here. We want to be able to log what IPs are assigned to what VMs if they somehow misbehave on the internet etc. Additionally it would be nice if Neutron dropped packets from unicast IPs that it had not assigned via DHCP.

@aborrero the network assignment is incorrect also.
2a02:ec80:a100::/56 is the entire public IPv6 allocation for WMCS in codfw. It contains 256 /64 networks which can be used.

So I'd recommend to we assign 2a02:ec80:a100:1::/64 for the first VM network in Netbox and use that. We should probably call it 'cloud-flat1-codfw1dev' as the name, so we can have 'cloud-flat2-codfw1dev' if needed down the road.

All Ethernet's in IPv6 get a /64 subnet assigned, not more and not less.

@aborrero the network assignment is incorrect also.
2a02:ec80:a100::/56 is the entire public IPv6 allocation for WMCS in codfw. It contains 256 /64 networks which can be used.

So I'd recommend to we assign 2a02:ec80:a100:1::/64 for the first VM network in Netbox and use that. We should probably call it 'cloud-flat1-codfw1dev' as the name, so we can have 'cloud-flat2-codfw1dev' if needed down the road.

All Ethernet's in IPv6 get a /64 subnet assigned, not more and not less.

fixed the subnet in the few patches above, as requested.

I don't know why, now there seems to be a working dhcpv6 setup and eveyone seems to connect everywhere:

root@ipv6-test-1:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens3             UP             172.16.129.248/24 metric 100 2a02:ec80:a100:1::29c/128 fe80::f816:3eff:fe8e:18ac/64 

root@ipv6-test-1:~# ping6 -c1 2a02:ec80:a100:1::200
PING 2a02:ec80:a100:1::200(2a02:ec80:a100:1::200) 56 data bytes
64 bytes from 2a02:ec80:a100:1::200: icmp_seq=1 ttl=64 time=0.818 ms

--- 2a02:ec80:a100:1::200 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.818/0.818/0.818/0.000 ms

root@ipv6-test-1:~# ping6 -c1 2a02:ec80:a100:1::1
PING 2a02:ec80:a100:1::1(2a02:ec80:a100:1::1) 56 data bytes
64 bytes from 2a02:ec80:a100:1::1: icmp_seq=1 ttl=64 time=2.26 ms

--- 2a02:ec80:a100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.262/2.262/2.262/0.000 ms

root@ipv6-test-2:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens3             UP             172.16.129.205/24 metric 100 2a02:ec80:a100:1::200/128 fe80::f816:3eff:fe79:9a3d/64 

root@ipv6-test-2:~# ping6 -c1 2a02:ec80:a100:1::29c
PING 2a02:ec80:a100:1::29c(2a02:ec80:a100:1::29c) 56 data bytes
64 bytes from 2a02:ec80:a100:1::29c: icmp_seq=1 ttl=64 time=0.694 ms

--- 2a02:ec80:a100:1::29c ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.694/0.694/0.694/0.000 ms

root@ipv6-test-2:~# ping6 -c1 2a02:ec80:a100:1::1
PING 2a02:ec80:a100:1::1(2a02:ec80:a100:1::1) 56 data bytes
64 bytes from 2a02:ec80:a100:1::1: icmp_seq=1 ttl=64 time=2.10 ms

--- 2a02:ec80:a100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.098/2.098/2.098/0.000 ms

I guess next bits to test with neutron would be to enable north-south traffic, meaning working on these two tickets:

I guess next bits to test with neutron would be to enable north-south traffic, meaning working on these two tickets:

Cool great progress!!

I'll have a look at this next week and get the bits we need in place. We also need to do some work to announce the ranges upstream, specifically:

  • Create objects in the RIPE DB for 2a02:ec80:a100::/48
  • Create RPKI ROA object with RIPE for 2a02:ec80:a100::/48
  • Modify eqiad CR inbound filter to block traffic to 'private' range 2a02:ec80:a100:100::/56
  • Modify eqiad CR BGP policy to announce 2a02:ec80:a100::/48 to peers and transit

I'll create a sub-task to track those bits too.

root@ipv6-test-1:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens3             UP             172.16.129.248/24 metric 100 2a02:ec80:a100:1::29c/128 fe80::f816:3eff:fe8e:18ac/64 

One thing that strikes me looking at the above that's incorrect is the VM has a /128 netmask on it's v6 address here. Should be 2a02:ec80:a100:1::29c/64 instead.

root@ipv6-test-1:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
ens3             UP             172.16.129.248/24 metric 100 2a02:ec80:a100:1::29c/128 fe80::f816:3eff:fe8e:18ac/64 

One thing that strikes me looking at the above that's incorrect is the VM has a /128 netmask on it's v6 address here. Should be 2a02:ec80:a100:1::29c/64 instead.

I have researched this a bit . I don't know why this happens, but a few things to note:

  • I have read some comments online that mention this netmask is the result of using DHCPv6 instead of router advertisements, see here and here
  • we run this subnet with the settings ipv6_address_mode: dhcpv6-stateful and ipv6_ra_mode: dhcpv6-stateful (see tofu-infra repo) which supports the theory in the previous point.
  • neutron also supports other modes, ex dhcpv6-stateless and slaac. I think I tried ealier with slaac and I could not make it work? But I can definitely try again. See also upstream docs for different combination options.
  • the problem with updating the subnet parameter is that it may force us to re-create the subnet, which is no big deal now, but it may be later when we have actual workloads

We definitely want to use DHCPv6 (stateful) for address assignment. So OpenStack is in control of what IPs are used, can set the DNS for them, and even potentially filter traffic coming from anything it had not assigned itself.

The system is getting its default route from RAs is that correct? The suggestion in the reddit thread is that the system should get it's subnet information (including netmask) from the RAs it receives, and thus when the DHCP assignment comes in it knows the subnet is a /64 already.

Looking at the RAs being generated I _think_ they look ok, but my knowledge here is somewhat patchy. The prefix/subnet is there, and the managed flag is set which is correct when using DHCP

image.png (562×891 px, 114 KB)

Taking a pcap of traffic to/from the VM on a reboot things look ok to me, you can see in the below we have the router-advertisement (with prefix info) just prior to the DHCPv6 exchange, so the system should be able to combine the two to set its IPv6 address with the correct netmask. I'll need to dig into the docs a bit more.

So.... maybe this is normal for DHCPv6? Re-reading the reddit post and looking at the setup on the VM it seems like stuff works.

Though it makes my OCD kick off :P

The system does see the entire /64 as reachable directly on the interface (no next-hop)

root@ipv6-test-1:~# ip  -6 route show 2a02:ec80:a100:1::/64
2a02:ec80:a100:1::/64 dev ens3 proto ra metric 100 expires 86315sec pref medium

And when it tries to comm`unicate with another host on the subnet it properly sends neighbor-solicitation requests:

cmooney@cloudvirt2005-dev:~$ sudo tcpdump -l -p -nn -i tapc13bc28b-31 icmp6 
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on tapc13bc28b-31, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:57:03.378347 IP6 2a02:ec80:a100:1::29c > ff02::1:ff00:199: ICMP6, neighbor solicitation, who has 2a02:ec80:a100:1::199, length 32
13:57:04.402298 IP6 2a02:ec80:a100:1::29c > ff02::1:ff00:199: ICMP6, neighbor solicitation, who has 2a02:ec80:a100:1::199, length 32

So maybe all this is fine ¯\_(ツ)_/¯

I think we can consider this to be completed. We may reopen if required.