dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krenair
	Aug 30 2017, 6:17 PM

Description

hieradata/codfw/profile/openstack/labtest/nova.yaml --- profile::openstack::labtest::nova::dmz_cidr
hieradata/codfw/profile/openstack/labtestn/neutron.yaml --- profile::openstack::labtestn::neutron::dmz_cidr
hieradata/eqiad/profile/openstack/eqiad1/neutron.yaml --- profile::openstack::eqiad1::neutron::dmz_cidr
hieradata/eqiad/profile/openstack/main/nova.yaml --- profile::openstack::main::nova::dmz_cidr

These setting contains a list of destination ranges which will not have the normal labs NAT rules applied. I.e. ranges in this list will see internal IPs

This does not cover everything in https://wikitech.wikimedia.org/wiki/IP_and_AS_allocations, leading to this:

krenair@bastion-01:~$ curl -skI https://text-lb.{esams,eqiad,ulsfo,codfw,eqsin}.wikimedia.org/wiki/Main_Page -H 'Host: en.wikipedia.org' | grep X-Client-IP
X-Client-IP: 208.80.155.129
X-Client-IP: 10.68.17.232
X-Client-IP: 208.80.155.129
X-Client-IP: 10.68.17.232
X-Client-IP: 208.80.155.129

The current dmz_cidr configuration for eqiad1 is (profile::openstack::eqiad1::neutron::dmz_cidr hiera key in hieradata/eqiad/profile/openstack/eqiad1/neutron.yaml).
Checklist to check if we are done with each setting.

172.16.0.0/21:91.198.174.0/24 (stuff in esams DC)
172.16.0.0/21:198.35.26.0/23 (stuff in uslfo DC)
172.16.0.0/21:10.0.0.0/8 (all private addresses in eqiad DC)
172.16.0.0/21:208.80.152.0/22 (stuff in codfw DC)
172.16.0.0/21:103.102.166.0/24 (stuff in eqsin DC)
172.16.0.0/21:172.16.0.0/21 (just added in T206261: Routing RFC1918 private IP addresses to/from WMCS floating IPs)

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T209460 CloudVPS: network architecture
		Resolved		aborrero	T174596 dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour

Event Timeline

Krenair created this task.Aug 30 2017, 6:17 PM

Restricted Application added a project: SRE. · View Herald TranscriptAug 30 2017, 6:17 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

See also T167357 where this task will probably become obsolete, I just wanted to document the effect of this really.

Ottomata triaged this task as Low priority.Jan 16 2018, 8:14 PM

Seems like even after T167357 I can reproduce the tests from the description.

My guess is that it's setup like that to make it easier to see where a query is coming from when it's purely internal.
and the current /22 includes eqiad and codfw, the two sites the cloud infra is likely to hit.

The future use of 172.16/12 space should make this "passthru" obsolete as the design is to have that space fully segregated from the "prod" infra.

In the meantime it might be worth making it consistent by adding missing public ranges, as they're very stable.

Pinging @aborrero and @Andrew as they might not be aware (not subscribed to the task)

Our plan is to keep using the dmz_cidr mechanism with the new 172.16 addressing space.

This is already in puppet:

git grep dmz_cidr hieradata
hieradata/codfw/profile/openstack/labtest/nova.yaml:profile::openstack::labtest::nova::dmz_cidr: '208.80.155.0/22,10.0.0.0/8'
hieradata/codfw/profile/openstack/labtestn/neutron.yaml:profile::openstack::labtestn::neutron::dmz_cidr: '172.16.128.0/24:10.0.0.0/8,172.16.128.0/24:208.80.155.0/22'
hieradata/eqiad/profile/openstack/eqiad1/neutron.yaml:profile::openstack::eqiad1::neutron::dmz_cidr: '172.16.0.0/21:91.198.174.0/24,172.16.0.0/21:198.35.26.0/23,172.16.0.0/21:10.0.0.0/8,172.16.0.0/21:208.80.152.0/22,172.16.0.0/21:103.102.166.0/24'
hieradata/eqiad/profile/openstack/main/nova.yaml:profile::openstack::main::nova::dmz_cidr: '208.80.155.0/22,10.0.0.0/8'

As you can see, the new eqiad1 dmz_cidr configuration has more ranges defined. Would this cover all of our relevant cases?

EDIT: you can read the config as:
Do not apply NAT to connections src:dst , src:dst, src:dst ....

So first, why maintain 4 different lists instead of 1? (or at least have the same subnets in each lists).
Then 185.15.56.0/22 is missing if we want to be exhaustive.

But my understanding (esp. after T167357) is that Cloud private IPs should not be seen outside the Cloud infrastructure, is that incorrect?

Yeah it would be best to have this list of prod networks to preserve 172 source IP for:

a) a fixed list of required end points which are NFS servers, dns recursors, and ?

b) come from an existing list the way that the allowances for pdns-recursor do in allow-from= (which now that I am looking at it why do we need to allow from that big list here?)

In T174596#4530414, @ayounsi wrote:

But my understanding (esp. after T167357) is that Cloud private IPs should not be seen outside the Cloud infrastructure, is that incorrect?

There are still things outside of the cloud network itself where source IP preservation is needed for sanity or functionality such as NFS necessitating a dmz-cidr functionality. It's precious few things where it's required I think and I believe all of them we would rather have move into the cloud network itself down the line.

Krenair updated the task description. (Show Details)Aug 24 2018, 5:52 PM

bd808 added a project: cloud-services-team (Kanban).Oct 8 2018, 12:05 AM

aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.Oct 18 2018, 3:37 PM

Does this still affect us? If so, which concrete subnets are affected?

I can't seem to access eqiad1.bastion.wmflabs.org right now but:

krenair@bastion-01:~$ curl -skI https://text-lb.{esams,eqiad,ulsfo,codfw,eqsin}.wikimedia.org/wiki/Main_Page -H 'Host: en.wikipedia.org' | grep X-Client-IP
X-Client-IP: 208.80.155.129
X-Client-IP: 10.68.17.232
X-Client-IP: 208.80.155.129
X-Client-IP: 10.68.17.232
X-Client-IP: 208.80.155.129

This is also the reason we have to have the following route on cr1/2-eqiad static route 172.16.0.0/21 next-hop 10.64.22.4 so when hosts in the 172.16/21 subnet reach hosts outside (without NAT), return traffic knows what path to take.

I think there are several action items here, the end goal being OpenStack to be fully separated from the remaining of the infrastructure.
1/ Standardize the list of target subnets to not use NAT as mentioned in T174596#4530414 (use 1 list instead of 4, the most complete one)
2/ add 185.15.56.0/22
3/ Add missing networks so the output of T174596#4680056 is coherent (probably fixed with 1/)
4/
a) Identify all the target hosts and flows that curently require NAT to be disabled (source IP preservation), as mentioned by @chasemp in T174596#4530767
b) Open tasks to track the required changes so they don't need this special case (hosts by host), eg. move them inside the Cloud infra
5/ Once all those hosts have been tackled, remove dmz_cidr, and the static route on the routers

In T174596#4681328, @ayounsi wrote:

This is also the reason we have to have the following route on cr1/2-eqiad static route 172.16.0.0/21 next-hop 10.64.22.4 so when hosts in the 172.16/21 subnet reach hosts outside (without NAT), return traffic knows what path to take.

I think there are several action items here, the end goal being OpenStack to be fully separated from the remaining of the infrastructure.
1/ Standardize the list of target subnets to not use NAT as mentioned in T174596#4530414 (use 1 list instead of 4, the most complete one)

The 4 hiera keys you see are the 4 openstack deployments we have right now. Each deployment has his own networking context that can't be shared between them. They can even run in different datacenters, so completely different addressing, etc.

You can learn more about our different deployments here: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Deployments

So I think we have 4 different lists just because we have 4 different environments, and I'm not sure it worth the effort trying to consolidate them (also note that the other 3 deployments not eqiad1 will be dropped once eqiad1 has completely replaced the main one).

2/ add 185.15.56.0/22

I'm try to handle this in T206261

3/ Add missing networks so the output of T174596#4680056 is coherent (probably fixed with 1/)

Investigating now, I found that I have some doubts:

hieradata/common.yaml doesn't exist anymore.
I don't really understand what means the curl query and the output. @Krenair could you please elaborate? Genuinely I don't understand what's wrong with that, or what would you expect it to return, etc. Please, advice.

4/
a) Identify all the target hosts and flows that curently require NAT to be disabled (source IP preservation), as mentioned by @chasemp in T174596#4530767
b) Open tasks to track the required changes so they don't need this special case (hosts by host), eg. move them inside the Cloud infra
5/ Once all those hosts have been tackled, remove dmz_cidr, and the static route on the routers

This is not actionable in this phab task, since it involves major rebuilding of some of our infra, including the way we do NFS.
Is not in our short-term roadmap and will have to wait for a future. Not that I love it, but we have to retain dmz_cidr for a while.

In T174596#4685019, @aborrero wrote:

I don't really understand what means the curl query and the output. @Krenair could you please elaborate? Genuinely I don't understand what's wrong with that, or what would you expect it to return, etc. Please, advice.

When labs instances connect to prod, I think logically either prod hosts should see labs private IPs, or they should see labs public IPs. Right now we appear to have a bizarre situation where it depends which prod DC you connect to.

ayounsi mentioned this in T122406: Consider renumbering Labs to separate address spaces.Oct 22 2018, 2:47 PM

Talked to Arturo on IRC, replying to my own questions.
I thought dmz_cidr were only the DST ranges to not do NAT on, but they are also used as SRC.
Which mean 1/ would still be possible but would make things confusing

3/ would be solved by adding: 91.198.174.0/24,198.35.26.0/23,103.102.166.0/24 to profile::openstack::main::nova::dmz_cidr and maybe the others
4/ In some way tracked in T207536

In T174596#4685786, @Krenair wrote:

In T174596#4685019, @aborrero wrote:

I don't really understand what means the curl query and the output. @Krenair could you please elaborate? Genuinely I don't understand what's wrong with that, or what would you expect it to return, etc. Please, advice.

When labs instances connect to prod, I think logically either prod hosts should see labs private IPs, or they should see labs public IPs. Right now we appear to have a bizarre situation where it depends which prod DC you connect to.

The right/desired behavior would be the latter, i.e. prod hosts (like cp* hosts) should never see any of the WMCS private space and only see WMCS public IPs. There are exceptions to this for e.g. WMCS supporting infrastructure like labstores that should be eventually phased out :)

aborrero moved this task from Doing to Soon! on the cloud-services-team (Kanban) board.Oct 26 2018, 11:19 AM

ayounsi mentioned this in T208244: ntp broken in new region.Oct 29 2018, 8:02 PM

aborrero mentioned this in T209011: Change routing to ensure that traffic originating from Cloud VPS is seen as non-private IPs by Wikimedia wikis.Nov 8 2018, 11:37 AM

ayounsi added a parent task: Restricted Task.Nov 8 2018, 6:20 PM

To my understanding, there are 3 ways an VM do egress traffic (https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron#Ingress_&_Egress):

using the common routing_source_ip address (default) (185.16.56.1 in the case of eqiad1)
if the VM has a floating ip, that addr will be used (public addr, 185.15.56.0/25 in the case of eqiad1)
if the src/dst matches the dmz_cidr config NAT is skipped and prod will see actual private addr (172.16.0.0/21 in the case of eqiad1)

Since the problem is only with the dmz_cidr mechanism, I will update this task description to review all the src/dst configurations we have right now and see if there is something to improve there.

aborrero updated the task description. (Show Details)Nov 12 2018, 1:14 PM

I guess there was a reason for these DC-wide bypasses. Before I start doing git/phab archaeology, does anybody knows why we have these settings?

BTW, I'm focusing on the eqiad1 deployment setting. Not paying much attention to the setting in main, since the new is the one we will be living with for upcoming times.
Please @Krenair @ayounsi do all your tests and checks from VMs in this deployment.

In T174596#4740124, @aborrero wrote:

BTW, I'm focusing on the eqiad1 deployment setting. Not paying much attention to the setting in main, since the new is the one we will be living with for upcoming times.
Please @Krenair @ayounsi do all your tests and checks from VMs in this deployment.

I say this because in a quick test:

aborrero@toolsbeta-sgebastion-03:~$ curl -skI https://text-lb.{esams,eqiad,ulsfo,codfw,eqsin}.wikimedia.org/wiki/Main_Page -H 'Host: en.wikipedia.org' | grep -i X-Client-IP
x-client-ip: 172.16.4.82
x-client-ip: 172.16.4.82
x-client-ip: 172.16.4.82
x-client-ip: 172.16.4.82
x-client-ip: 172.16.4.82

In T174596#4740124, @aborrero wrote:

BTW, I'm focusing on the eqiad1 deployment setting. Not paying much attention to the setting in main, since the new is the one we will be living with for upcoming times.
Please @Krenair @ayounsi do all your tests and checks from VMs in this deployment.

It looks consistent from your test! I guess there's still the question of whether prod should see private cloud IPs at all or whether it should see public IPs but that's a matter for another ticket, at least it appears consistent now.

In T174596#4741643, @Krenair wrote:

I guess there's still the question of whether prod should see private cloud IPs at all or whether it should see public IPs but that's a matter for another ticket, at least it appears consistent now.

This is being discussed in other tickets, like T209011: Change routing to ensure that traffic originating from Cloud VPS is seen as non-private IPs by Wikimedia wikis. Closing this task now, feel free to reopen if required :-)

aborrero closed this task as Resolved.Nov 13 2018, 10:14 AM

ayounsi removed a parent task: Restricted Task.Nov 13 2018, 12:53 PM

aborrero added a parent task: T209460: CloudVPS: network architecture.Nov 14 2018, 10:46 AM

faidon mentioned this in T214313: Add new Tool Labs IPs to Varnish rate limit whitelist.Jan 21 2019, 8:07 PM

• nskaggs subscribed.Sep 18 2020, 6:10 PM