Page MenuHomePhabricator

Cloud VPS instance with floating (public) IP can not ping that IP directly
Closed, ResolvedPublic

Description

Hi, i associated a floating ip to the "gerrit-test4" instance in the "gerrit" project. But the ip returning 100% packet loss from inside the instance but only 33% packet loss from my mac.

IP being 185.15.56.55.

I assigned it using https://horizon.wikimedia.org/project/floating_ips/ and the pool is "wan-transport-eqiad".

Event Timeline

Paladox created this task.Mar 5 2019, 5:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 5 2019, 5:14 PM

Update: Appears to no longer be showing packet loss on my mac, but gerrit-test4 cannot ping the ip. (100% packet loss)

Krenair added a subscriber: Krenair.Mar 5 2019, 5:20 PM

Are instances able to route to their own floating IPs in general?

Krenair edited projects, added Cloud-VPS; removed Cloud-Services.Mar 5 2019, 5:21 PM
Paladox added a comment.EditedMar 5 2019, 5:28 PM

@Krenair Yup, as i can on gerrit-test3

ping gerrit.git.wmflabs.org
PING gerrit.git.wmflabs.org (172.16.1.184) 56(84) bytes of data.
64 bytes from gerrit-test3.git.eqiad.wmflabs (172.16.1.184): icmp_seq=1 ttl=64 time=0.031 ms
64 bytes from gerrit-test3.git.eqiad.wmflabs (172.16.1.184): icmp_seq=2 ttl=64 time=0.054 ms

--- gerrit.git.wmflabs.org ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.031/0.042/0.054/0.013 ms

But strange thing pinging gerrit.gerrit.wmflabs.org works on gerrit-test3:

ping gerrit.gerrit.wmflabs.org
PING gerrit.gerrit.wmflabs.org (185.15.56.55) 56(84) bytes of data.
64 bytes from gerrit-test4.gerrit.eqiad.wmflabs (172.16.0.148): icmp_seq=1 ttl=64 time=2.56 ms
64 bytes from gerrit-test4.gerrit.eqiad.wmflabs (172.16.0.148): icmp_seq=2 ttl=64 time=0.589 ms
^C
--- gerrit.gerrit.wmflabs.org ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.589/1.575/2.562/0.987 ms

Mentioned in SAL (#wikimedia-cloud) [2019-03-05T18:27:08Z] <bd808> Added BryanDavis (self) as projectadmin to investigate T217681

bd808 added a subscriber: bd808.Mar 5 2019, 6:51 PM
$ ssh root@gerrit-test4.gerrit.eqiad.wmflabs
$ host 185.15.56.55
55.56.15.185.in-addr.arpa domain name pointer instance-gerrit-test4.gerrit.wmflabs.org.
55.56.15.185.in-addr.arpa domain name pointer gerrit.gerrit.wmflabs.org.
$ ping -c 10 -w 10 185.15.56.55
PING 185.15.56.55 (185.15.56.55) 56(84) bytes of data.

--- 185.15.56.55 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9210ms
$ host 185.15.56.52
52.56.15.185.in-addr.arpa domain name pointer instance-tools-docker-registry-03.tools.wmflabs.org.
52.56.15.185.in-addr.arpa domain name pointer docker-registry.tools.wmflabs.org.
$ ping -c 10 -w 10 185.15.56.52
PING 185.15.56.52 (185.15.56.52) 56(84) bytes of data.
64 bytes from 172.16.7.216: icmp_seq=1 ttl=64 time=1.80 ms
64 bytes from 172.16.7.216: icmp_seq=2 ttl=64 time=0.502 ms
64 bytes from 172.16.7.216: icmp_seq=3 ttl=64 time=5.13 ms
64 bytes from 172.16.7.216: icmp_seq=4 ttl=64 time=0.504 ms
64 bytes from 172.16.7.216: icmp_seq=5 ttl=64 time=0.795 ms
64 bytes from 172.16.7.216: icmp_seq=6 ttl=64 time=0.437 ms
64 bytes from 172.16.7.216: icmp_seq=7 ttl=64 time=0.613 ms
64 bytes from 172.16.7.216: icmp_seq=8 ttl=64 time=0.543 ms
64 bytes from 172.16.7.216: icmp_seq=9 ttl=64 time=2.22 ms
64 bytes from 172.16.7.216: icmp_seq=10 ttl=64 time=0.684 ms

--- 185.15.56.52 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9097ms
rtt min/avg/max/mdev = 0.437/1.324/5.132/1.396 ms

Pings are broadly working from gerrit-test4.gerrit.eqiad.wmflabs to hosts in the 185.15.56.0/25 subnet, but not to the 185.15.56.55 address that is a NAT masquerade for localhost. My first thought here was to check the service group rules for the gerrit project to make sure that the ping ICMP packets are allowed in. That does seem to be the case via the Ingress IPv4 ICMP Any 0.0.0.0/0 rule.

Out of curiosity I ran the same set of pings from tools-docker-registry-03.tools.eqiad.wmflabs (the "owner" of the 185.15.56.52 floating IP that was pingable from gerrit-test4.gerrit.eqiad.wmflabs and found symmetric behavior. That is to say that tools-docker-registry-03.tools.eqiad.wmflabs can ping the floating ip for gerrit-test4.gerrit.eqiad.wmflabs, but not its own floating ip.

The test of this nature that @Paladox did in T217681#5002279 used the hostname "gerrit.git.wmflabs.org" rather than the floating IP address (185.15.56.28). The hostname will resolve to the fixed ip if the instance rather than the floating ip when the lookup is done from inside the Cloud VPS address space. This is due to the split-horizon DNS that we deploy which presents different IPs based on the IP of the requesting client.

So I think we may be back to @Krenair's question from T217681#5002239: are instances with a floating IP attached supposed to be able to ping or otherwise communicate with their own floating IP?

bd808 renamed this task from Packet loss to floating ip after assigning it to a instance in the gerrit project to Cloud VPS instance with floating (public) IP can not ping that IP directly.Mar 5 2019, 6:53 PM
aborrero claimed this task.Mar 8 2019, 10:34 AM
aborrero triaged this task as Normal priority.

There are some special NAT behaviors in CloudVPS that may not be obvious at first sight. I would need to re-check and re-read my own documentation (https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron) before I can provide a specific answer to this

aborrero closed this task as Resolved.Mar 19 2019, 12:19 PM
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

I was able to investigate this a bit more to provide some answers.

The VM sends a packet like this:

172.16.0.148 > 185.15.56.55 ICMP echo request.

This packet reach the neutron virtual router, and I can see it in tcpdump:

root@cloudnet1004:~# tcpdump -n -i qr-defc9d1d-40 icmp and host 172.16.0.148
11:51:48.652815 IP 172.16.0.148 > 185.15.56.55: ICMP echo request, id 32318, seq 1, length 64

Then, the PREROUTING rules NAT applies, translating 185.15.56.55 into 176.16.0.148. The corresponding conntrack NAT engine event:

root@cloudnet1004:~ 3s # conntrack -E -p icmp --src 172.16.0.148
    [NEW] icmp     1 30 src=172.16.0.148 dst=185.15.56.55 type=8 code=0 id=32395 [UNREPLIED] src=172.16.0.148 dst=172.16.0.148 type=0 code=0 id=32395

When this happens, the packet is put again in the wire, and I can see it again in tcpdump in neutron. You can see the 2 packets, the first without NAT, the second with the NAT applied:

root@cloudnet1004:~# tcpdump -n -i qr-defc9d1d-40 icmp and host 172.16.0.148
11:51:48.652815 IP 172.16.0.148 > 185.15.56.55: ICMP echo request, id 32318, seq 1, length 64
11:51:48.652842 IP 172.16.0.148 > 172.16.0.148: ICMP echo request, id 32318, seq 1, length 64

The neutron virtual router routes this packet back to the original VM, and you can see the NATed packet reaching the interface. Note how I selected only incoming packets in tcpdump using -Q in

root@gerrit-test4:~# tcpdump -n -i eth0 -Q in icmp
11:51:48.650504 IP 172.16.0.148 > 172.16.0.148: ICMP echo request, id 32318, seq 1, length 64

And here is the thing. That packet can't be routed by the VM:

root@gerrit-test4:~# ip route get 172.16.0.148 from 172.16.0.148 iif eth0
RTNETLINK answers: Invalid argument

This is known as a martian packet (https://en.wikipedia.org/wiki/Martian_packet), and you can actually see the kernel complaining if you turn on martian packet logging:

root@gerrit-test4:~# sysctl net.ipv4.conf.all.log_martians=1
root@gerrit-test4:~# dmesg -T | tail -2
[Tue Mar 19 12:16:26 2019] IPv4: martian source 172.16.0.148 from 172.16.0.148, on dev eth0
[Tue Mar 19 12:16:26 2019] ll header: 00000000: fa 16 3e d9 29 75 fa 16 3e ae f5 88 08 00        ..>.)u..>.....

I will update the docs at https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron to document this.

Krenair added a comment.EditedMar 19 2019, 12:29 PM

The neutron virtual router routes this packet back to the original VM, and you can see the NATed packet reaching the interface. Note how I selected only incoming packets in tcpdump using -Q in

root@gerrit-test4:~# tcpdump -n -i eth0 -Q in icmp
11:51:48.650504 IP 172.16.0.148 > 172.16.0.148: ICMP echo request, id 32318, seq 1, length 64

And here is the thing. That packet can't be routed by the VM:

root@gerrit-test4:~# ip route get 172.16.0.148 from 172.16.0.148 iif eth0
RTNETLINK answers: Invalid argument

This is known as a martian packet (https://en.wikipedia.org/wiki/Martian_packet)

Should our instances really be treating these as martian packets? They're not coming in from the internet. Our instances would usually expect 172.16.0.0/21 traffic on eth0 surely? Are they just upset that they're getting a packet in with one of their own IPs as the source?

What if instances had their floating IP (if any) configured locally on their eth0 interface alongside their 172?

aborrero added a comment.EditedMar 19 2019, 12:45 PM

Should our instances really be treating these as martian packets? They're not coming in from the internet. Our instances would usually expect 172.16.0.0/21 traffic on eth0 surely? Are they just upset that they're getting a packet in with one of their own IPs as the source?

The problem is that for local IP address, we recv a packet with same src/dst IPv4, with different src/dst MAC address. That's nonsense from the network stack if not configured otherwise.

Allowing this snowflake case is pretty simple actually. Not sure if it worth addressing this cloud-wide.

This fix is:

root@gerrit-test4:~# sysctl net.ipv4.conf.all.accept_local=1

root@gerrit-test4:~# ping 185.15.56.55
PING 185.15.56.55 (185.15.56.55) 56(84) bytes of data.
64 bytes from 172.16.0.148: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 172.16.0.148: icmp_seq=2 ttl=64 time=0.228 ms
^C
--- 185.15.56.55 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1011ms
rtt min/avg/max/mdev = 0.202/0.215/0.228/0.013 ms

root@gerrit-test4:~# ip route get 172.16.0.148 from 172.16.0.148 iif eth0
local 172.16.0.148 from 172.16.0.148 dev lo 
    cache <local>  iif eth0

docs: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

accept_local - BOOLEAN
	Accept packets with local source addresses. In combination with
	suitable routing, this can be used to direct packets between two
	local interfaces over the wire and have them accepted properly.
	default FALSE