Page MenuHomePhabricator

CloudVPS: we may need DNS records for neutron port VIP addresses
Closed, ResolvedPublic

Description

The really strange thing: you CAN ping the IP. However, any VM with a public IP can easily curl it. Our test case is:
curl https://hub.paws.wmcloud.org/hub/metrics since that's where I found this.

The response times out like a firewall issue, but we so far haven't found a firewall. Better yet, it seems as though a connection is made (SYN/ACK), but things don't quite end up right.

Seems interesting!

Event Timeline

Bstorm renamed this task from CloudVPS: VMs cannot seem to curl public IPs unless they have them with an open security group to CloudVPS: VMs cannot seem to curl public IPs unless they also have public IPs, even with an open security group.Jul 21 2021, 5:31 PM
Bstorm created this task.
aborrero triaged this task as Medium priority.Jul 22 2021, 9:28 AM
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

I see at least one thing I can explain.

Data:

  • tools-prometheus-03 ip addr 172.16.1.8
  • hub.paws.wmcloud.org ip addr 185.15.56.57 (floating IP)
  • haproxy neutron port VIP 172.16.1.171 (mapped to the floating IP)
  • paws-haproxy-1 ip addr 172.16.0.191 (the VM currently holding the neutron port VIP, and thus receiving the floating IP NAT)

Flow:

  • From tools-prometheus-03 we send a TCP SYN to hub.paws.wmcloud.org. Packet1: src: 172.16.1.8 dst: 185.15.56.57.
  • This packets is DNAT'ed by neutron l3 agent into the neutron port VIP. Packet1: src: 172.16.1.8 dst: 172.16.1.171.
  • paws-haproxy-1 recvs the TCP SYN, and replies with SYN+ACK from the same source address it originally recv'd the SYN. Packet2: src: 172.16.1.171 dst: 172.16.1.8.

This is a typical case of asymmetric routing
Moreover, the cloudvirts have stateful firewalling, and Packet2 wont pass a stateful firewall check.

I found this nice diagram on the internet, and I think it's right:

image.png (566×800 px, 90 KB)

So a simple solution here is to connect to the neutron port VIP directly, instead of the floating IP address. Didn't we have a trick in the resolver to workaround this? Or did we drop it?

So a simple solution here is to connect to the neutron port VIP directly, instead of the floating IP address. Didn't we have a trick in the resolver to workaround this? Or did we drop it?

We do (labsaliaser), but this address is not directly assigned to a VM (it's managed with keepalived), so it does not find the address when looping thru all instances and their addresses from Nova.

In T287107#7229504, @Majavah wrote:

So a simple solution here is to connect to the neutron port VIP directly, instead of the floating IP address. Didn't we have a trick in the resolver to workaround this? Or did we drop it?

We do (labsaliaser), but this address is not directly assigned to a VM (it's managed with keepalived), so it does not find the address when looping thru all instances and their addresses from Nova.

so perhaps the simple solution is to add the VIP to the DNS.

aborrero renamed this task from CloudVPS: VMs cannot seem to curl public IPs unless they also have public IPs, even with an open security group to CloudVPS: we may need DNS records for neutron port VIP addresses.Jul 22 2021, 10:51 AM

so perhaps the simple solution is to add the VIP to the DNS.

Unfortunately, this is host-based routing and TLS. We know we can fake that out by adding a Host header, but we are limited by what we can do in prometheus. The VIP is in DNS, but I may not be able to scrape it at that address.

We could teach the ip-aliaser to read the connection to a port. Is that what you mean? I can go mess with that :)

[bstorm@cloudcontrol1003]:~ $ sudo wmcs-openstack --os-project-id paws floating ip list --project paws
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+---------+
| ID                                   | Floating IP Address | Fixed IP Address | Port                                 | Floating Network                     | Project |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+---------+
| 1b287d46-acf5-4978-8b4b-4da8cff0ea0b | 185.15.56.57        | 172.16.1.171     | 9c0a9a13-e409-49de-9ba3-bc8ec4801dbf | 5c9ee953-3a19-4e84-be0f-069b5da75123 | paws    |

Just imagine this situation:

user@tools-prometheus-05:~$ dig +short hub.paws.wmcloud.org
172.16.1.171

instead of the current:

user@tools-prometheus-05:~$ dig +short hub.paws.wmcloud.org
185.15.56.57

TLS stuff wont even notice.

Change 706814 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] cloud dns: alias all attached floating_ips, not just server ones

https://gerrit.wikimedia.org/r/706814

I realized, the aliaser should totally handle this.

Change 706814 merged by Bstorm:

[operations/puppet@production] cloud dns: alias all attached floating_ips, not just server ones

https://gerrit.wikimedia.org/r/706814

That seems to have worked:

bstorm@tools-prometheus-03:~$ dig @208.80.154.143 hub.paws.wmcloud.org

; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> @208.80.154.143 hub.paws.wmcloud.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1609
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;hub.paws.wmcloud.org.		IN	A

;; ANSWER SECTION:
hub.paws.wmcloud.org.	1054	IN	CNAME	paws.wmcloud.org.
paws.wmcloud.org.	3274	IN	A	172.16.1.171

Aaand:

bstorm@tools-prometheus-03:~$ curl https://hub.paws.wmcloud.org/hub/metrics | wc -l
1199

Thanks for giving me the idea!