allow routing between labs instances and public labs ips (done, document)
Closed, ResolvedPublic

Description

Labs floating IPs can't be reached from labs instances.

For example, my instance util-abogott has a public ip at 208.80.155.192. It also has a firewall policy that should allow ping from anywhere. I can ping 208.80.155.192 from my laptop, and from iron. I can not ping it from within labs, though, not from a labs bastion or from another instance in the same project (testlabs).

This is old news -- we have a few hacks in place with dnsmasq so that some .wmflabs.org lookups resolve to internal IPs in order to route around the embargo on contact between labs instances and floating IPs.

But -- I'm not clear on why this isn't possible. I can't think of security reasons to block such traffic -- maybe it's blocked by accident, or maybe such routing is hard due to the labs network topology. Supporting routing to floating IPs would allow us to rip out some hacks and also promote adoption of the new labs DNS (which lacks the equivalent hacks at the moment.)

Andrew created this task.Apr 22 2015, 9:47 PM
Andrew updated the task description. (Show Details)
Andrew raised the priority of this task from to Needs Triage.
Andrew assigned this task to faidon.
Andrew added a subscriber: Andrew.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 22 2015, 9:47 PM
yuvipanda added a subscriber: yuvipanda.
hashar added a subscriber: hashar.EditedApr 22 2015, 10:08 PM

I originally reported it as T39985: Public IP not acccessible from labs instance, in short that is due to NAT. Some more details at https://rt.wikimedia.org/Ticket/Display.html?id=4824#txn-147723

I originally fixed it using iptables rules to rewrite the destination public IP to the private IP. That needed us to apply the rules on each instances. Then Brandon Black came up with the dnsmasq alias.

I haven't done networking in a while and have no idea about the labs network system. The idea is that packet being received from labs instance should be translated and routed on the internal interface. I guess right now they end up being routed to the internet and get lost (cause their source IP is a private IP).

Might be worth asking to an OpenStack user group or dig in the Networking component (per Andrew, It is nova-network, not Neutron).

@faidon @akosiaris @mark this would make labs' lives a lot simpler if we can fix this. Anyone want to take a stab / knw who can take a stab?

Hello,

So I started having a look at this yesterday. labnet1001 does see the ICMP echo request packets on the wire but for some reason I am still researching it is not ever responding to them. I 'll post updates here as soon as I have them

@akosiaris from my past comment, could it be that NAT is not applied on the inbound interface and thus the packets are routed to the internet (since they have the public IP as a destination).

The public IP is assigned on the outbound interface of labnet1001 so the ICMP echo requests should never be routed to the internet (and they actually are not as a tcpdump on the outbound interface proved). DNAT is being applied on the inbound interface, still figuring out the rules.

Here's an update on this.

When a labs VM wants to contact a public IP it will use its local routing table to figure out where to send the packet. The routing table has 2 entries

  • default via 10.68.16.1 dev eth0 metric 100
  • 10.68.16.0/21 dev eth0 proto kernel scope link src 10.68.16.19

Given that the public IPs are not in 10.68.16.0/21 the default route will be used and the packet will be sent to the gateway, namely labnet1001. All this is good and well and expected, just giving some generic background information.

On to labnet1001, which receives the packet on its br1102 interface, a bridge interface with a single slave, eth1.1102 - curious as to why a bridge with a single slave interface exists in the first place - where there are various netfilter rules. One of rulechains, notably nova-network-PREROUTING, is responsible for the DNAT (Destination Network Address Translation) of public IPs to the labs VMs private IPs. That and its counterpart nova-network-POSTROUTING which does the reverse SNAT (private to public but only if a DNAT has happened for a connection already) as the two building blocks of the Elastic IPs feature. Do note we are talking, in the PREROUTING case, for destination IP only rewriting and Source IP only rewriting in the the POSTROUTING case.

In the VM to public IP case, what happens is, assuming IP1 the initiating VM's private IP, IP2 the destination VM's private IP, and IPP the public IP:

  • The packet will leave the VM with Source IP: IP1, Destination IP: IPP.
  • The packet will enter labnet1001 with Source IP IP1, Destination IP: IPP
  • The packet will have the header rewritten with Source IP IP1, Destination IP: IP2 (the DNAT happens)
  • The packet is now delivered to the destination VM with Source IP IP1, Destination IP: IP2
  • The destination VM will answer to that packet with a newly generated one with Source IP: IP2, Destination IP: IP1 directly to the initiatiing VM, not using labnet1001 at all, which is normal inter-VM communication. But the process on the initiating VM will never expect to receive that packet cause it has nothing to do with the connection it started. And.. fail.

Before somebody jumps the gun and says "hey you mentioned SNAT above" how about we SNAT IP1 as well, there is the simple question of what to SNAT TO and even better, how to differentiate between the 2 different SNATs a VM may need and why create all this difficult to maintain and understand mess which anyway is not controlled by us but by nova. Which means we need to change nova, which last time I checked has iptables rules hardcoded.

Do note that I did find some irregularities on those rules.

  • For example the nova-network-PREROUTING chain has a rule to rewrite 169.254.169.254 tcp dpt:80 to:10.64.20.13:8775 which is nova-api. That IP is used by VMs to thet the ec2id. It is unclear to me if it is used for anything else and why it is in that ruleset and not nova-api-PREROUTING which sounds way better anyway and also does not mixup with VM rules
  • The PREROUTING DNAT happens unconditionally from where the packet comes from. That creates that blackholing effect. My take would be to only happen on the internet facing interface of labnet1001. That way the blackholing effect would be prevented and labnet1001 would be answering to connection requests/icmp packets for the public IPs. This would not give the desired functionality but at least it would mitigate the blackhole effect. That being said, I am afraid this requires nova modifications as well.

My recommendation at this point is to just redo the split horizon at DNS for horizon as well. It will solve the problem in a cleaner albeit less full way.

Thank you for investigating, Alex!

Well done @akosiaris, you have been granted a coupon for your favorite drink. To be redeemed next time we see each others, just point to this task.


Regarding the use of a bridge, upstream doc at "Networking options" mentions only bridges are supported, no hint as to why it is a requirement though.

About the metadata rule 169.254.169.254 tcp dpt:80 to:10.64.20.13:8775 On Private and public IP addresses there is a section Traffic between VMs using floating IPs which mentions dmz_cidr. The question "What is the purpose of dmz_cidr? further elaborate on it (emphasis mine):

Outgoing traffic from the vms is SNATTED to the ip of the network host (old mode) or the compute host (HA Networking --multi_host mode). This is to allow for them to communicate with the rest of the internet.
It may be that there are some services that the hosts need to communicate with that are on an internal network where you want the source ip to remain the private ip of the host. The accept rule stops the normal SNAT.
The most common use case is to allow the metadata api to use the private ip to look up data for the instance, so generally you can just set it to the /32 of your metadata server if you have just one. It is a cidr in case there are multiple services that you want to keep using the internal private ips.

So in theory modules/openstack/templates/icehouse/nova/nova.conf.erb should have:

dmz_cidr=169.254.169.254/32
metadata_host=169.254.169.254  # probably needed as well

When the destination is rewritten to the public IP, I would expect it to be routed outbound with a port translation using whatever public IP pool used by instances when they want to reach internet. This way the packet will reenter with public IP in both source and dest, sent to the other VM with a source public and come back to labnet port translation table which will deliver it back to the original VM.

I guess that is not how it works :/

Andrew added a comment.May 5 2015, 7:38 PM

Someone on IRC just claimed that there was a setting to fix this in new nova-network implementations. The closest thing I can find is force_snat_range: https://review.openstack.org/#/c/29795/

I guess we would set that to everything /but/ our floating ip range... does that seem right, or am I misunderstanding?

Andrew added a comment.May 5 2015, 7:43 PM

For the record, our current floating ip range is 208.80.155.128/25

I 've looked into the specified change. It is about limiting the SNAT feature to a specified set of destination ranges. The behavior I described above it the result of the DNAT rules and has nothing to do with the SNAT rules. So it is irrelevant :-(

Andrew added a comment.May 6 2015, 9:04 PM

I've referred this to the openstack mailing list. I predict that a thousand voices cry out as one and tell me to install Neutron.

Antonio Messina on that thread is also giving the exact same explanation for the problem. However he is also providing a solution for the "SNAT to what and when" question I posed before. Seems like icehouse does provide those rules he is proposing and we do have then in nova-network-POSTROUTING chain which is called from POSTROUTING. However those rules are never reached as we got those 2 rules embedded before them which terminate (ACCEPT is a terminating action meaning the next rules in the chain are ignored)

ACCEPT     all  --  *      *       10.68.16.0/21        208.80.152.0/22     
ACCEPT     all  --  *      *       10.68.16.0/21        10.0.0.0/8

Anyway, I tried inserting one of the rules before them, but it did not trigger anything. Investigating why

akosiaris added a comment.EditedMay 11 2015, 12:35 PM

Managed to get something working. Specifically I added

iptables -t nat -I POSTROUTING 1 -j SNAT -s 10.68.16.68 -m conntrack --ctstate DNAT --to-source 208.80.155.155

which allowed bastion-restricted1 to ping util-abogott on the public IP. For this to work there had to be the following prereqs:

  • Both VMs have to have a public floating IP. No public floating IP assigned to any of the 2 VMs, no answer
  • Security rules need to allow the traffic. My tests were with ICMP because it is allowed.
  • labnet1001's bridge interface needs to be in promiscuous mode

The first one is mentioned in https://github.com/openstack/nova/commit/b8c434630d31f49ae0e9686ddfac8f25acf117b1 and hence makes sense, the second one is expected, the third one is... troubling. Looking more into it but we at least have had some success.

Alex, are there any obvious next steps here? Unfortunately pdns doesn't support split horizon, so the alternative to fixing this is ugly, requiring two different dns servers.

After some conversation on IRC, we tried unsettings dmz_cidr. The 2 rules I mentioned above where indeed cleared, but various problems cropped up as VMs started using public IPs to access anything on the 2 aforementionted labnets. Services like puppet, LDAP, NFS, labsdbs stopped being accessible. Evaluating the best possible values for this setting.

https://gerrit.wikimedia.org/r/#/c/210720/ removes other labs instances (private IPs) from the dmz. That should break fairly few things.

Labs security rules that are exclusive to 10.0.0.0/8 will start blocking traffic from other labs hosts that have public IPs. I may be able to script an addition to all such security rules adding the labs public IP range.

I have a script which scans for every security group and adds an identical rule for the floating ip range 208.80.155.128/25 for anything that passes this:

if rule['Range'] == "10.0.0.0/8" or rule['Range'] == "10.4.0.0/21" or rule['Range'] == "10.4.0.0/22" or rule['Range'] == "10.4.0.0/23" or rule['Range'] == "10.4.0.0/24":

I've tested it on a test project but haven't run it everywhere yet. Seems safe to me...

scfc added a subscriber: scfc.EditedMay 13 2015, 9:33 PM

For Tools, in modules/dynamicproxy/manifests/init.pp and modules/toollabs/manifests/proxy.pp we limit some functionality to ferm's $INTERNAL hosts. Could modules/base/templates/firewall/defs.labs.erb (?) be amended so that Labs's $INTERNAL covers the public IPs as well, please?

Change 210853 had a related patch set uploaded (by Tim Landscheidt):
Labs: Include public IPs in ferm's $INTERNAL

https://gerrit.wikimedia.org/r/210853

Change 210853 abandoned by Tim Landscheidt:
Labs: Include public IPs in ferm's $INTERNAL

Reason:
AFAIUI, the scale tips to a split horizon DNS, so this is no longer necessary.

https://gerrit.wikimedia.org/r/210853

faidon removed faidon as the assignee of this task.Jun 10 2015, 1:19 PM
faidon removed a project: Patch-For-Review.
faidon set Security to None.
faidon added a subscriber: faidon.
Andrew renamed this task from allow routing between labs instances and public labs ips to allow routing between labs instances and public labs ips (done, document).Aug 3 2015, 5:44 PM
Andrew added a project: Labs-Sprint-108.
Andrew added a comment.Aug 5 2015, 5:14 PM

Currently, a subset of floating IPs are properly aliased to their internal IPs by the labs pdns recursor.

The list of ips affected is hardcoded in puppet, as nova_floating_ip_aliases in role::labsdnsrecursor.

Needs automating!

Andrew closed this task as Resolved.
Andrew claimed this task.
Andrew moved this task from To Do to Done on the Labs-Sprint-108 board.