Page MenuHomePhabricator

More public IPs for codfw1dev
Closed, ResolvedPublic

Description

@rook is doing some testing with Magnum in codfw1dev that turns out to demand more floating IPs than we have available.

The current pool is only 8 IPs: 185.15.57.0/29
We're using two more service ips which are allocated as 185.15.57.8/30

It looks to me like we could grab some more by adding another subnet or two: 185.15.57.12/30 and (possibly) 185.15.57.16/29

I'm pretty sure that those ranges are already designated for wmcs use and we won't collide with other uses, but I'm opening this task so that someone in netops can confirm that this is safe.

@cmooney , @ayounsi do you agree?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I’ve no objection in principal.

185.15.57.12/30 is currently unallocated in Netbox. 185.15.57.16/29 is reserved there with description *“Temporary and potentially for future cloud-instance-transport1-b-codfw - T263622”*. So it’s allocated for WMCS although earmarked for something other than this.

Is this just a temporary thing or will this be permanent? Also are you sure you need both the /30 and /29 subnet?

It’ll need to be routed on the edge if we do add anythng. In that case should they be routed to the cloudgw (similar to the existing ones)?

185.15.57.12/30 should be enough, so let's start with that. With luck that'll be all we need, and we can leave it as a permanent change.

In terms of routing: it'll be assigned to the wan-transport-codfw network, just as the existing two subnets listed above (185.15.57.0/29 and 185.15.57.8/30) so if it's possible to just duplicate that setup we should be good.

@Andrew I'm reluctant to allocate more space for WMCS in Codfw, when there is a /29 already allocated and not being used.

So I've routed 185.15.57.16/29 to the cloudgw (208.80.153.190) on the CRs in codfw instead. If that doesn't suit we can review but best not to use up more space if we've another option.

Ready to go now, let me know if you've any problems.

This works!

+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| allocation_pools     | 185.15.57.17-185.15.57.22            |
| cidr                 | 185.15.57.16/29                      |
| created_at           | 2022-07-28T19:59:20Z                 |
| description          |                                      |
| dns_nameservers      |                                      |
| dns_publish_fixed_ip | None                                 |
| enable_dhcp          | True                                 |
| gateway_ip           | None                                 |
| host_routes          |                                      |
| id                   | 0c4069a0-21a3-4d39-a651-51303738f007 |
| ip_version           | 4                                    |
| ipv6_address_mode    | None                                 |
| ipv6_ra_mode         | None                                 |
| name                 | cloud-codfw1dev-floating-additional  |
| network_id           | 57017d7c-3817-429a-8aa3-b028de82cdcc |
| prefix_length        | None                                 |
| project_id           | admin                                |
| revision_number      | 3                                    |
| segment_id           | None                                 |
| service_types        |                                      |
| subnetpool_id        | None                                 |
| tags                 |                                      |
| updated_at           | 2022-07-28T20:06:02Z                 |
+----------------------+--------------------------------------+

Thanks @cmooney

These IPs are reachable from within codfw1dev but not from the greated Internet. @cmooney is that what you'd expect? It's possible that this is something misconfigured within neutron but it looks right to me.

I think the problem might be with the gateway_ip, it is set to none. Or perhaps it is the allocation pool starts at 17 rather than 18, and if we start it at 18 openstack automagically will setup a gateway for us?

root@cloudcontrol2001-dev:~# openstack subnet show 0c4069a0-21a3-4d39-a651-51303738f007
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| allocation_pools     | 185.15.57.17-185.15.57.22            |
| cidr                 | 185.15.57.16/29                      |
| created_at           | 2022-07-28T19:59:20Z                 |
| description          |                                      |
| dns_nameservers      |                                      |
| dns_publish_fixed_ip | None                                 |
| enable_dhcp          | True                                 |
| gateway_ip           | None                                 |
| host_routes          |                                      |
| id                   | 0c4069a0-21a3-4d39-a651-51303738f007 |
| ip_version           | 4                                    |
| ipv6_address_mode    | None                                 |
| ipv6_ra_mode         | None                                 |
| name                 | cloud-codfw1dev-floating-additional  |
| network_id           | 57017d7c-3817-429a-8aa3-b028de82cdcc |
| prefix_length        | None                                 |
| project_id           | admin                                |
| revision_number      | 3                                    |
| segment_id           | None                                 |
| service_types        |                                      |
| subnetpool_id        | None                                 |
| tags                 |                                      |
| updated_at           | 2022-07-28T20:06:02Z                 |
+----------------------+--------------------------------------+

The subnet is updated but seeing the same kinds of results:

openstack subnet show a9439c35-f465-475c-85a0-8e0f0f41ac4d
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| allocation_pools     | 185.15.57.18-185.15.57.22            |
| cidr                 | 185.15.57.16/29                      |
| created_at           | 2022-07-29T14:59:45Z                 |
| description          |                                      |
| dns_nameservers      |                                      |
| dns_publish_fixed_ip | None                                 |
| enable_dhcp          | True                                 |
| gateway_ip           | 185.15.57.17                         |
| host_routes          |                                      |
| id                   | a9439c35-f465-475c-85a0-8e0f0f41ac4d |
| ip_version           | 4                                    |
| ipv6_address_mode    | None                                 |
| ipv6_ra_mode         | None                                 |
| name                 | cloud-codfw1dev-floating-additional  |
| network_id           | 57017d7c-3817-429a-8aa3-b028de82cdcc |
| prefix_length        | None                                 |
| project_id           | admin                                |
| revision_number      | 0                                    |
| segment_id           | None                                 |
| service_types        |                                      |
| subnetpool_id        | None                                 |
| tags                 |                                      |
| updated_at           | 2022-07-29T14:59:45Z                 |
+----------------------+--------------------------------------+

So that probably wasn't it. I'm seeing the same routing information on the server that we are launching and the bastion, both route to 172.16.128.1, though the former cannot get out. Assuming I'm reading the associated security group information correctly, it seems to be setup to allow traffic to egress to anywhere. Serverside IP information also looks the same for the bastion and the vm, both see themselves as having a 172.16.128.0/24 address.

Hi Andrew,

I'm unable to find any issue here. Looking at the cloud-in acl/filter on the CR routers there does is no rule that will block traffic from that range, it should hit a "default allow" on the last line.

Testing on the current master cloudgw, temporarily adding one of these IPs to an interface on it, also works:

cmooney@cloudgw2002-dev:~$
cmooney@cloudgw2002-dev:~$ sudo ip addr add 185.15.57.17/32 dev eno2.2120
cmooney@cloudgw2002-dev:~$
cmooney@cloudgw2002-dev:~$ sudo ip vrf exec vrf-cloudgw ping -I 185.15.57.17 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 185.15.57.17 : 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=61 time=3.06 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=61 time=2.96 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=61 time=3.03 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=61 time=3.03 ms
^C
--- 1.1.1.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 2.964/3.023/3.064/0.036 ms
cmooney@cloudgw2002-dev:~$
cmooney@cloudgw2002-dev:~$ sudo ip addr del 185.15.57.17/32 dev eno2.2120
cmooney@cloudgw2002-dev:~$

So on the routing level / our side I'm confident all is set up and working correctly. Looking on the cloudgw I can't see how this could work given the current configuration. I expect the new range (185.15.57.16/29) needs to be routed to the cloudnet VIP 185.15.57.10, similar to what is in place for 185.15.57.0/29. Currently there is no route for this range in place, so any traffic that reaches the cloudgw for it is routed back towards the CRs, following the catch-all default route the cloudgw has (see below traceroute, hops 14 onwards). Instead the cloudgw needs to know this new range is internal via the cloudnet/neutron hosts.

cmooney@wikilap:~$ mtr -b -w -c 4 185.15.57.17
Start: 2022-08-02T11:38:18+0100
HOST: wikilap                                                       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- br0.nbgw.nb.rankinrez.net (192.168.240.1)                      0.0%     4    0.6   0.7   0.6   0.9   0.1
  2.|-- 176.61.34.1                                                    0.0%     4   12.8  15.9  10.4  25.2   6.5
  3.|-- 109.255.253.254                                                0.0%     4   21.5  17.6  15.5  21.5   2.7
  4.|-- ie-dub02a-rc1-ae-31-0.aorta.net (84.116.238.38)                0.0%     4   15.0  14.4  12.8  16.4   1.6
  5.|-- ie-dub02a-ri1-ae-74-0.aorta.net (84.116.134.38)                0.0%     4   38.8  18.1   9.9  38.8  13.8
  6.|-- ae-14.a00.dublir01.ie.bb.gin.ntt.net (129.250.8.5)             0.0%     4   10.0  13.0  10.0  16.8   2.8
  7.|-- ae-1.a01.dublir01.ie.bb.gin.ntt.net (129.250.7.23)            75.0%     4    9.9   9.9   9.9   9.9   0.0
  8.|-- ae-6.r21.londen12.uk.bb.gin.ntt.net (129.250.3.10)             0.0%     4   30.1  35.1  30.0  42.4   6.1
  9.|-- ae-13.r25.asbnva02.us.bb.gin.ntt.net (129.250.2.111)           0.0%     4  102.1 100.6  98.4 102.1   1.6
 10.|-- ae-6.r20.dllstx14.us.bb.gin.ntt.net (129.250.5.12)             0.0%     4  141.8 145.0 140.8 151.5   4.9
 11.|-- ae-0.a01.dllstx14.us.bb.gin.ntt.net (129.250.4.22)             0.0%     4  147.0 142.2 137.1 147.0   4.0
 12.|-- xe-2-5-3-1.a01.dllstx14.us.ce.gin.ntt.net (128.242.179.182)    0.0%     4  139.3 144.9 139.3 159.6   9.8
 13.|-- xe-5-0-0.cr2-codfw.wikimedia.org (208.80.153.212)              0.0%     4  138.4 140.1 138.4 141.5   1.5
 14.|-- cloudgw2002-dev.codfw1dev.wikimediacloud.org (208.80.153.189)  0.0%     4  130.9 131.2 130.0 132.7   1.1
 15.|-- ae2-2120.cr1-codfw.wikimedia.org (208.80.153.186)              0.0%     4  133.5 135.0 132.7 140.9   3.9
 16.|-- cloudgw2002-dev.codfw1dev.wikimediacloud.org (208.80.153.189) 75.0%     4  131.8 131.8 131.8 131.8   0.0
 17.|-- ae2-2120.cr1-codfw.wikimedia.org (208.80.153.186)              0.0%     4  135.9 135.1 133.5 137.2   1.8
 18.|-- cloudgw2002-dev.codfw1dev.wikimediacloud.org (208.80.153.189) 75.0%     4  131.9 131.9 131.9 131.9   0.0
 19.|-- ae2-2120.cr1-codfw.wikimedia.org (208.80.153.186)              0.0%     4  137.1 136.0 134.3 137.1   1.3
 20.|-- cloudgw2002-dev.codfw1dev.wikimediacloud.org (208.80.153.189) 75.0%     4  130.4 130.4 130.4 130.4   0.0
 21.|-- ae2-2120.cr1-codfw.wikimedia.org (208.80.153.186)              0.0%     4  136.2 134.8 132.0 137.2   2.3
 22.|-- cloudgw2002-dev.codfw1dev.wikimediacloud.org (208.80.153.189) 75.0%     4  131.4 131.4 131.4 131.4   0.0
 23.|-- ae2-2120.cr1-codfw.wikimedia.org (208.80.153.186)              0.0%     4  135.1 135.4 134.7 136.9   1.0

I've been looking for how we see the routing of a subnet in openstack, but thus far have come up with little. How did you identify that there is no routing setup for 185.15.57.16/29 ?

Hi @rook, you probably need to confirm within the cloud team, but as far as I am aware the cloudgw nodes are external to OpenStack completely, providing the external gateway for the Neurtron (cloudnet) hosts.

In terms of checking the routing on the cloudgw itself it can be seen that there is no route in the vrf-cloudgw vrf using the "ip" command

cmooney@cloudgw2002-dev:~$ ip route show vrf vrf-cloudgw
default via 208.80.153.185 dev eno2.2120 onlink
172.16.128.0/24 via 185.15.57.10 dev eno2.2107 proto keepalived onlink
185.15.57.0/29 via 185.15.57.10 dev eno2.2107 proto keepalived onlink
185.15.57.8/30 dev eno2.2107 proto kernel scope link src 185.15.57.9
208.80.153.184/29 dev eno2.2120 proto kernel scope link src 208.80.153.189

It's also clear from the traceroute I posted above that the cloudgw is not set up to forward traffic for the new range (as it sends it back to the CR router, and it just loops back and forth between those).

Change 819593 had a related patch set uploaded (by Vivian Rook; author: Vivian Rook):

[operations/puppet@production] extra ips for codfw1dev

https://gerrit.wikimedia.org/r/819593

Change 819593 merged by Andrew Bogott:

[operations/puppet@production] extra ips for codfw1dev

https://gerrit.wikimedia.org/r/819593