Page MenuHomePhabricator

Routing RFC1918 private IP addresses to/from WMCS floating IPs
Closed, ResolvedPublic

Description

In T41785 a question was raised about whether or not it is possible to route in WMCS from RFC1918 private address space (e.g. 10.0.0.0/8, 172.16.0.0/12, etc.) to/from public floating IP addresses.

After speaking a bit with @Andrew he suggested creating a subtask and tagging @aborrero to discuss/explore this in more detail, so here it is!

Event Timeline

herron triaged this task as Medium priority.Oct 4 2018, 8:55 PM
herron created this task.

We have a mechanisms called dmz_cidr which we can use to exclude NATs between certain IP ranges.
See a more detailed explanation here: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron#dmz_cidr

Do you think that would help us here? If not, Could you please elaborate a bit more on the context?

We have a mechanisms called dmz_cidr which we can use to exclude NATs between certain IP ranges.
See a more detailed explanation here: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron#dmz_cidr
Do you think that would help us here? If not, Could you please elaborate a bit more on the context?

If I'm understanding this correctly, yes. From the wikitech article...

https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron#dmz_cidr
"The dmz_cidr mechanisms allows us to define certain IP ranges to which VMs can talk to directly without NAT being involved. This allows us to offer services to VMs easily, implementing access control in those services, etc.
One classic example is NFS stores, which would like to see actual VM IP addresses rather than generic NAT addresses."
<snip>
"You can read these config as: do not apply NAT to connections src:dst"

Does this mean we could route private network traffic to the mail smarthost public floating IPs without NAT using a config to similar to below? (the floating IPs for mx-outNN are 185.15.56.18 and 185.15.56.19)

profile::openstack::main::nova::dmz_cidr: '172.16.0.0/12:185.15.56.18/31'
profile::openstack::main::nova::dmz_cidr: '10.0.0.0/8:185.15.56.18/31'

For T41785 the context is outbound mail. The new outbound smarthosts mx-outNN.cloudinfra.eqiad.wmflabs have floating public IP addresses associated, and these floating IPs have been given corresponding DNS names mx-outNN.wmflabs.org. When mail clients in WMCS use the smarthost (for example mx-out01.wmflabs.org) the connection traverses NAT, and from the perspective of mx-outNN.cloudinfra.eqiad.wmflabs the source IP address of the connection is the NAT address (example internal-server-nat.wmflabs.org or 185.15.56.1). This is sub-optimal since many WMCS instances would appear to the smarthosts as the same IP, making it more difficult to troubleshoot abuse, rate limit, etc.

On a related note it looks like the labs aliaser was enabled in eqiad1 just this morning (T41785#4644690) which afaict provides similar functionality. From my perspective routing without NAT is preferable, but of course up to WMCS on how best to approach this.

Wait, the NAT is only applied to connections in the egress path (from VM to the internet), and connections internal to CloudVPS are not affected by this egress NAT, i.e, VMs in the 172.16.0.0/12 range can contact 185.15.56.18/31 without source NAT being applied.

VM (172.16.0.X) ---> Neutron router DNAT (185.16.56.18) ---> mx-out01.cloudinfra.eqiad.wmflabs (172.16.1.239)

So, the only thing that NAT affects here is that the random VM in the left doesn't know the actual IP address of the mx-out VM, only the DNAT address (floating IP).

Wait, the NAT is only applied to connections in the egress path (from VM to the internet), and connections internal to CloudVPS are not affected by this egress NAT, i.e, VMs in the 172.16.0.0/12 range can contact 185.15.56.18/31 without source NAT being applied.

VM (172.16.0.X) ---> Neutron router DNAT (185.16.56.18) ---> mx-out01.cloudinfra.eqiad.wmflabs (172.16.1.239)

So, the only thing that NAT affects here is that the random VM in the left doesn't know the actual IP address of the mx-out VM, only the DNAT address (floating IP).

Hmm, here's what I'm seeing:

cloudinfra-puppetmaster-01:~# nc -vz 185.15.56.18 25
mx-out01.wmflabs.org [185.15.56.18] 25 (smtp) open
mx-out01:~# tcpdump -ni any port 25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
18:01:00.956781 IP 185.15.56.1.34330 > 172.16.1.239.25: Flags [S], seq 2492930878, win 29200, options [mss 1460,sackOK,TS val 301736818 ecr 0,nop,wscale 9], length 0
18:01:00.956811 IP 185.15.56.1.34330 > 172.16.1.239.25: Flags [S], seq 2492930878, win 29200, options [mss 1460,sackOK,TS val 301736818 ecr 0,nop,wscale 9], length 0
18:01:00.956984 IP 172.16.1.239.25 > 185.15.56.1.34330: Flags [S.], seq 2201494697, ack 2492930879, win 28960, options [mss 1460,sackOK,TS val 278276502 ecr 301736818,nop,wscale 9], length 0

Ok, it seems I'm wrong, will have to review my own docs lol

So I will have to investigate how to better configure this, either what you suggested:

profile::openstack::eqiad1::nova::dmz_cidr: '172.16.0.0/12:185.15.56.18/31'

or even a more generic approach:

profile::openstack::eqiad1::nova::dmz_cidr: '172.16.0.0/12:172.16.0.0/12'

Change 468546 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: eqiad1: exclude SNAT between VMs

https://gerrit.wikimedia.org/r/468546

Change 468546 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: eqiad1: exclude SNAT between VMs

https://gerrit.wikimedia.org/r/468546

Mentioned in SAL (#wikimedia-operations) [2018-10-19T10:53:07Z] <arturo> icinga downtime for 2h for clounet1003/1004 to deploy patch related to T206261

hey @herron it should be working now. This was my test:

aborrero@cloudinfra-puppetmaster-01:~$ ping -c1 185.15.56.18
PING 185.15.56.18 (185.15.56.18) 56(84) bytes of data.
64 bytes from 172.16.1.239: icmp_seq=1 ttl=64 time=0.661 ms

--- 185.15.56.18 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.661/0.661/0.661/0.000 ms

and then:

aborrero@mx-out01:~$ sudo tcpdump -n -i eth0 icmp
10:54:46.747986 IP 172.16.1.230 > 172.16.1.239: ICMP echo request, id 10013, seq 1, length 64
10:54:46.748087 IP 172.16.1.239 > 172.16.1.230: ICMP echo reply, id 10013, seq 1, length 64

Closing task, feel free to reopen if necessary :-)

Mentioned in SAL (#wikimedia-cloud) [2018-10-19T11:16:27Z] <arturo> change in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)

Reopening and reverting patch. I can confirm is causing at least 2 issues:

  1. ssh issue to eqiad1.bastion.wmflabs.org
  1. icinga2-wm: PROBLEM - ping4 on phab.wmflabs.org is WARNING: PING WARNING - DUPLICATES FOUND!

There is some complex routing involved here and I would need to review why the patch affects our current use cases.

Mentioned in SAL (#wikimedia-cloud) [2018-10-19T12:02:47Z] <arturo> revert change in dmz_cidr in eqiad1 for now (T206261)

Change 468940 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: eqiad1: exclude SNAT between VMs contacting floating IPs

https://gerrit.wikimedia.org/r/468940

Mentioned in SAL (#wikimedia-operations) [2018-10-22T10:03:57Z] <arturo> icinga downtime for cloudnet1003/4 for T206261

Change 468940 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: eqiad1: exclude SNAT between VMs contacting floating IPs

https://gerrit.wikimedia.org/r/468940

Mentioned in SAL (#wikimedia-cloud) [2018-10-22T10:26:11Z] <arturo> change again in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)

I just discovered that applying the patch (and running puppet in both cloudnets at the same time) resulted in both neutron servers being active at the same time:

root@cloudcontrol1003:~# neutron l3-agent-list-hosting-router cloudinstances2b-gw
+--------------------------------------+--------------+----------------+-------+----------+
| id                                   | host         | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | cloudnet1003 | True           | :-)   | active   |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | cloudnet1004 | True           | :-)   | active   |
+--------------------------------------+--------------+----------------+-------+----------+

So, duplicated IP addresses in the network, which was probably the cause of the networking issues discovered in T206261#4680107.

Right now, the applied patch is adding 172.16.0.0/21:185.15.56.0/25, not sure if it worth to try the original idea 172.16.0.0/21:172.16.0.0/21.
Anyway, I will wait some hours to see if more issues arise or if the problem was just the duplicated neutron server instance.

Thanks @aborrero!

After a quick ping test using the cloudinfra project I'm still seeing traffic originate from 185.15.56.1 Is that expected to be the case now?

$ host mx-out01.wmflabs.org 1.1.1.1
mx-out01.wmflabs.org has address 185.15.56.18
cloudinfra-puppetmaster-01:~# ping -c1 185.15.56.18
PING 185.15.56.18 (185.15.56.18) 56(84) bytes of data.
64 bytes from 185.15.56.18: icmp_seq=1 ttl=63 time=0.799 ms
mx-out01:~$ sudo tcpdump -ni any icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
13:43:50.313182 IP 185.15.56.1 > 172.16.1.239: ICMP echo request, id 22048, seq 1, length 64
13:43:50.313215 IP 185.15.56.1 > 172.16.1.239: ICMP echo request, id 22048, seq 1, length 64
13:43:50.313304 IP 172.16.1.239 > 185.15.56.1: ICMP echo reply, id 22048, seq 1, length 64

I just discovered that applying the patch (and running puppet in both cloudnets at the same time) resulted in both neutron servers being active at the same time:
So, duplicated IP addresses in the network, which was probably the cause of the networking issues discovered in T206261#4680107.

Seeing duplicate ICMP echo requests in the tcpdump above as well. Is that expected to still be happening?

Thanks @aborrero!

After a quick ping test using the cloudinfra project I'm still seeing traffic originate from 185.15.56.1 Is that expected to be the case now?

Sorry, I didn't do enough tests, I just focused in the duplicated packets issue.
It seems that my patch isn't enough to cover our case, will cook another patch soon.

mx-out01:~$ sudo tcpdump -ni any icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
13:43:50.313182 IP 185.15.56.1 > 172.16.1.239: ICMP echo request, id 22048, seq 1, length 64
13:43:50.313215 IP 185.15.56.1 > 172.16.1.239: ICMP echo request, id 22048, seq 1, length 64
13:43:50.313304 IP 172.16.1.239 > 185.15.56.1: ICMP echo reply, id 22048, seq 1, length 64

Seeing duplicate ICMP echo requests in the tcpdump above as well. Is that expected to still be happening?

Be aware of the ifb0 interface present in some instances. That should show duplicate packages. If you filter by eth0, they won't appear.
By default tcpdump -i any prints the same packages if they reenter the stack again, so they look duplicated (but they are not).
The duplicated that I was seing were in the ping output and were actual duplicated packets in the wire. But that should be fixed now.

Change 469019 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: eqiad1: fix dmz_cidr adressing for inter-VM connections

https://gerrit.wikimedia.org/r/469019

Mentioned in SAL (#wikimedia-operations) [2018-10-22T16:24:22Z] <arturo> T206261 2h icinga downtime cloudnet1003/4 for another patch

Mentioned in SAL (#wikimedia-cloud) [2018-10-22T16:24:45Z] <arturo> T206261 another update to dmz_cidr in eqiad1

Change 469019 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: eqiad1: fix dmz_cidr adressing for inter-VM connections

https://gerrit.wikimedia.org/r/469019

Please @herron try now. This is my test:

aborrero@cloudinfra-puppetmaster-01:~$ ping -c1 185.15.56.18
PING 185.15.56.18 (185.15.56.18) 56(84) bytes of data.
64 bytes from 172.16.1.239: icmp_seq=1 ttl=64 time=1.07 ms

--- 185.15.56.18 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.075/1.075/1.075/0.000 ms
aborrero@mx-out01:~$ sudo tcpdump -n -i eth0 icmp
16:30:51.492015 IP 172.16.1.230 > 172.16.1.239: ICMP echo request, id 26922, seq 1, length 64
16:30:51.492144 IP 172.16.1.239 > 172.16.1.230: ICMP echo reply, id 26922, seq 1, length 64

I didn't detect any issue so far. Closing task now :-)

Does this mean that we no longer need the IP aliaser in eqiad1-r?

Heads up, I'm reverting the changes introduced in this ticket, see T257534: CloudVPS: a VM is unable to contact floating IPs of other VMs for reference. I'm pretty sure the changes weren't working as expected anyway, and nobody noticed.