Page MenuHomePhabricator

Allocate public v4 IPs for Neutron setup in eqiad
Closed, ResolvedPublic

Description

208.80.155.128/25 is currently statically routed to 10.64.20.13 (labnet1001) and used for floating-ip's in the cloud setup.

DNS for these is delegated to the nameservers managed by designate as well:

templates/155.80.208.in-addr.arpa

; 208.80.155.128/25 Eqiad Labs virtualization subnet
; Delegate 208.80.155.128 - 208.80.155.255 to labs-ns*

128-25 IN NS labs-ns0.wikimedia.org.
       IN NS labs-ns1.wikimedia.org.

Out of this /25 (128 possible hosts) we are currently using 104 addresses

nova list --all-tenants | grep 208

1| e76864c4-14ef-4a5f-86f3-d28bddd222d7 | bastion-01 | bastion | ACTIVE | - | Running | public=10.68.17.232, 208.80.155.129 |
2| d0fbda52-c292-45a8-a8d2-0374cd3328c2 | bastion-02 | bastion | ACTIVE | - | Running | public=10.68.18.65, 208.80.155.153 |
3| 1dc6c5d8-a5bd-4c12-98dd-0062651866b7 | bastion-restricted-01 | bastion | ACTIVE | - | Running | public=10.68.18.66, 208.80.155.155 |
4| 20804f7d-5b5b-4937-b32a-faf1b33d0575 | ceph-3 | services | ACTIVE | - | Running | public=10.68.23.121 |
5| 3ae0bcec-5dda-4598-891e-01ad707f47d1 | cvn-app8 | cvn | ACTIVE | - | Running | public=10.68.22.2, 208.80.155.230 |
6| 61ef4a36-57ac-489e-840e-e68832b3f5a4 | cvn-app9 | cvn | ACTIVE | - | Running | public=10.68.22.42, 208.80.155.235 |
7| e0bcbdf7-d942-4835-a6c1-b0ba54f254c1 | cyberbot-exec-iabot-01 | cyberbot | ACTIVE | - | Running | public=10.68.23.31, 208.80.155.236 |
8| 64c9625a-dace-4e69-a31c-c3b5401e3366 | deployment-cache-text04 | deployment-prep | ACTIVE | - | Running | public=10.68.18.103, 208.80.155.135 |
9| 88c5b221-ea9e-4c93-ba4e-c1ebecb4c887 | deployment-cache-upload04 | deployment-prep | ACTIVE | - | Running | public=10.68.18.109, 208.80.155.136 |
10| b19b711f-56aa-407d-bf59-ed1c06b2cfba | deployment-ircd | deployment-prep | ACTIVE | - | Running | public=10.68.20.19, 208.80.155.209 |
11| 811cd53f-855b-490f-b28e-c80184600dd5 | deployment-mx | deployment-prep | ACTIVE | - | Running | public=10.68.17.78, 208.80.155.193 |
12| 5db64903-d4f1-418a-96d6-736bb71f00c8 | deployment-secureredirexperiment | deployment-prep | ACTIVE | - | Running | public=10.68.17.132, 208.80.155.164 |
13| 062645da-d98c-4ba7-9277-65cbf118d6b8 | emoji | mobile | ACTIVE | - | Running | public=10.68.18.254, 208.80.155.139 |
14| 72f0492c-3762-4e25-afcb-5ae208a91f64 | federation-wikis | wikidata-federation | ACTIVE | - | Running | public=10.68.21.1 |
15| b98e35fc-e4d5-4b88-b82c-e70ed0badcf2 | gerrit-test3 | git | ACTIVE | - | Running | public=10.68.22.16, 208.80.155.149 |
16| 16f8028d-6ae6-4c9a-a1d7-c4e215f1f0f8 | google-api-proxy-02 | google-api-proxy | ACTIVE | - | Running | public=10.68.16.111, 208.80.155.245 |
17| 6c16a9a4-eef4-44b8-8e9e-18c2dded7cdc | integration-slave-docker-1009 | integration | ACTIVE | - | Running | public=10.68.21.208 |
18| a82089e3-b0c7-4f01-a2d3-4d15dc113ade | jobs | security-tools | ACTIVE | - | Running | public=10.68.17.248 |
19| d59e74c8-689f-444f-8bf8-656805f9ae89 | labs-bootstrapvz-jessie | openstack | ACTIVE | - | Running | public=10.68.16.114, 208.80.155.210 |
20| 54860d9a-4ca9-423d-a047-51cebe0f1c70 | labs-bootstrapvz-stretch | openstack | ACTIVE | - | Running | public=10.68.18.144, 208.80.155.226 |
21| 719db00d-fb54-4140-8be6-0b916d8c2b95 | lizenzhinweisgenerator | lizenzhinweisgenerator | ACTIVE | - | Running | public=10.68.21.194, 208.80.155.208 |
22| a9321cd4-c8dd-47db-91f3-3a7f2ae658ac | mathosphere | math | ACTIVE | - | Running | public=10.68.20.14, 208.80.155.185 |
23| b46c0132-52da-414b-8cb4-c843ecea6a22 | medbox3-iiab | iiab | ACTIVE | - | Running | public=10.68.20.212, 208.80.155.228 |
24| be626a7c-1041-460e-8da9-70bf597d5161 | mw-base | wikitextexp | ACTIVE | - | Running | public=10.68.19.154, 208.80.155.182 |
25| ce5c7a90-ac55-4e47-9a11-5b9178df947f | mw-expt | wikitextexp | ACTIVE | - | Running | public=10.68.19.143, 208.80.155.188 |
26| 397696d8-067d-4bec-b840-dcd1585d5577 | mwaas-k8-node-01 | scrumbugz | ACTIVE | - | Running | public=10.68.21.99, 208.80.155.238 |
27| 6e0d14c4-57b0-4f00-bc21-b767e85796e2 | mwoffliner1 | mwoffliner | ACTIVE | - | Running | public=10.68.16.224, 208.80.155.198 |
28| f2f35951-705a-4df9-9b8d-1e250be2feae | mwoffliner2 | mwoffliner | ACTIVE | - | Running | public=10.68.17.82, 208.80.155.202 |
29| 219c402e-3eaa-4a84-bc80-6d2c26c84533 | mwoffliner3 | mwoffliner | ACTIVE | - | Running | public=10.68.16.64, 208.80.155.205 |
30| b47f37da-d60d-4a5e-9d20-2a67fdb0ddf2 | mwstake | mwstake | ACTIVE | - | Running | public=10.68.22.143, 208.80.155.232 |
31| 6e83879e-336a-4b01-98f8-5084b41e54b7 | novaproxy-01 | project-proxy | ACTIVE | - | Running | public=10.68.21.68, 208.80.155.156 |
32| 67e0beac-a799-451a-a8d7-a8db651beb79 | osmit-tre | osmit | ACTIVE | - | Running | public=10.68.19.86, 208.80.155.233 |
33| 2fc6fcd6-dcfc-4769-8260-8208934d4862 | otrs-oneclickspam-test | otrs | SHUTOFF | - | Shutdown | public=10.68.18.227 |
34| 55146dff-4aa1-4485-86c2-8444ea861bb9 | oxygen | rcm | ACTIVE | - | Running | public=10.68.18.100, 208.80.155.234 |
35| 41d82349-f0c0-4cf6-9edb-1459a316f470 | paws-master-01 | paws | ACTIVE | - | Running | public=10.68.17.49, 208.80.155.222 |
36| 22a00740-9c8f-4258-8ad0-b4082c03deee | phabricator | phabricator | ACTIVE | - | Running | public=10.68.20.184, 208.80.155.150 |
37| 3456d375-8d93-450e-9f95-155d67915392 | pub2 | wikiapiary | ACTIVE | - | Running | public=10.68.22.152, 208.80.155.237 |
38| eb9c97ca-da20-4c6b-a05c-ade3336978dc | relic | toolserver-legacy | ACTIVE | - | Running | public=10.68.16.162, 208.80.155.197 |
39| 0406b18d-6fec-459a-9620-140b0a046e49 | reporescue | openstack | ACTIVE | - | Running | public=10.68.20.66, 208.80.155.225 |
40| 186d16e8-e0ea-401c-a699-4fee8e86973a | signwriting-icon-server-3 | signwriting | ACTIVE | - | Running | public=10.68.19.208 |
41| 7d4c554c-8811-40d9-b876-dc447994fbe4 | t166878 | otrs | ACTIVE | - | Running | public=10.68.20.208 |
42| 2347a41f-30b2-44a8-a02c-12fd86761a43 | telnet2 | telnet | ACTIVE | - | Running | public=10.68.17.160, 208.80.155.160 |
43| dc88c3f6-b685-4e5a-8a54-436b63147497 | tools-bastion-02 | tools | ACTIVE | - | Running | public=10.68.16.44, 208.80.155.132 |
44| 09f60ae1-78aa-4d17-8ff7-6d5fb29006dc | tools-bastion-03 | tools | ACTIVE | - | Running | public=10.68.23.58, 208.80.155.163 |
45| e150bf25-f0bd-4eb0-963c-48f04af01b62 | tools-bastion-05 | tools | ACTIVE | - | Running | public=10.68.23.74, 208.80.155.130 |
46| e3598907-19b5-4c32-a16b-e5d3c30069dd | tools-checker-01 | tools | ACTIVE | - | Running | public=10.68.16.228, 208.80.155.229 |
47| ba17482f-0c0b-415e-9b07-031c5ea4ab82 | tools-docker-registry-01 | tools | ACTIVE | - | Running | public=10.68.23.203, 208.80.155.194 |
48| f51d08ba-081c-44bb-8c73-693c61a01b3a | tools-exec-1401 | tools | ACTIVE | - | Running | public=10.68.17.202, 208.80.155.140 |
49| 110119ec-7fe1-436f-819d-275c0b6a6fa8 | tools-exec-1402 | tools | ACTIVE | - | Running | public=10.68.17.205, 208.80.155.141 |
50| 79a41ff1-de61-4061-a4b0-7cd1d25d658f | tools-exec-1403 | tools | ACTIVE | - | Running | public=10.68.17.239, 208.80.155.143 |
51| 065722c0-08fe-4712-9e03-aaaa4001d74b | tools-exec-1404 | tools | ACTIVE | - | Running | public=10.68.18.12, 208.80.155.144 |
52| 307878dd-edf3-469d-804d-971553f2fb82 | tools-exec-1405 | tools | ACTIVE | - | Running | public=10.68.18.3, 208.80.155.145 |
53| acabe2c1-98b4-4753-9dd7-64dfb7dfed08 | tools-exec-1406 | tools | ACTIVE | - | Running | public=10.68.18.13, 208.80.155.146 |
54| 44aba1c0-8cc7-4346-9069-755566e3bc22 | tools-exec-1407 | tools | ACTIVE | - | Running | public=10.68.18.16, 208.80.155.147 |
55| b90fdc70-04e4-4de3-9c06-b8c6c4371d8b | tools-exec-1408 | tools | ACTIVE | - | Running | public=10.68.18.14, 208.80.155.152 |
56| bfd46a21-408a-4698-9870-b261258c0bf4 | tools-exec-1409 | tools | ACTIVE | - | Running | public=10.68.18.17, 208.80.155.186 |
57| bef35b12-6710-4e1e-ab80-bec699a0bcdc | tools-exec-1410 | tools | ACTIVE | - | Running | public=10.68.18.18, 208.80.155.187 |
58| f284b92f-3e86-4127-bdd1-d7e32bb65809 | tools-exec-1411 | tools | ACTIVE | - | Running | public=10.68.17.209, 208.80.155.178 |
59| 294b284d-a269-4de2-91d2-cdf3bd927463 | tools-exec-1412 | tools | ACTIVE | - | Running | public=10.68.23.154, 208.80.155.177 |
60| 0e977f78-56bf-477b-9ae1-8580ca211d3b | tools-exec-1413 | tools | ACTIVE | - | Running | public=10.68.23.103, 208.80.155.176 |
61| e6a06c1b-3948-4220-8674-889dbdaf21bb | tools-exec-1414 | tools | ACTIVE | - | Running | public=10.68.23.178, 208.80.155.171 |
62| a935afbd-a4b0-41e3-bb37-a307a02e4573 | tools-exec-1415 | tools | ACTIVE | - | Running | public=10.68.20.251, 208.80.155.170 |
63| a5a5f035-ba55-4709-862f-9e31a137630e | tools-exec-1416 | tools | ACTIVE | - | Running | public=10.68.23.14, 208.80.155.142 |
64| 4a674872-b18a-41a8-9c3e-88e61137eed7 | tools-exec-1417 | tools | ACTIVE | - | Running | public=10.68.23.172, 208.80.155.175 |
65| f0be05f1-1fd6-4b25-843c-cdf378f2adbe | tools-exec-1418 | tools | ACTIVE | - | Running | public=10.68.23.142, 208.80.155.167 |
66| a8d848bf-71a8-4ff7-aec9-47bdb617f4d7 | tools-exec-1419 | tools | ACTIVE | - | Running | public=10.68.23.223, 208.80.155.173 |
67| 2c0cf363-c7c3-42ad-94bd-e586f2492321 | tools-exec-1420 | tools | ACTIVE | - | Running | public=10.68.21.42, 208.80.155.148 |
68| 79d459f6-0fdf-4cc5-8ba2-5a5511350c82 | tools-exec-1421 | tools | ACTIVE | - | Running | public=10.68.21.169, 208.80.155.191 |
69| 3c4de41a-7689-4c53-a9ff-c337534c26bc | tools-exec-1422 | tools | ACTIVE | - | Running | public=10.68.19.24, 208.80.155.172 |
70| 3ca3d4ac-d205-4f78-a554-ab87120d7322 | tools-exec-1423 | tools | ACTIVE | - | Running | public=10.68.21.65, 208.80.155.179 |
71| 74a700c8-d023-43e5-a7c1-05b51586aa9e | tools-exec-1424 | tools | ACTIVE | - | Running | public=10.68.19.159, 208.80.155.166 |
72| 643950bf-32b3-42f1-ba68-8054598e2f38 | tools-exec-1425 | tools | ACTIVE | - | Running | public=10.68.17.151, 208.80.155.180 |
73| 249e3107-f8b9-4851-bde0-842d703e5474 | tools-exec-1426 | tools | ACTIVE | - | Running | public=10.68.18.205, 208.80.155.181 |
74| 5fe2782a-13fa-4b10-baf1-f577dc698d7d | tools-exec-1427 | tools | ACTIVE | - | Running | public=10.68.16.94, 208.80.155.200 |
75| 52ac2af1-fea2-4793-8d08-c850a186acf5 | tools-exec-1428 | tools | ACTIVE | - | Running | public=10.68.20.54, 208.80.155.195 |
76| 6de93d6e-8599-40f3-a2af-a75313e465b5 | tools-exec-1429 | tools | ACTIVE | - | Running | public=10.68.16.61, 208.80.155.183 |
77| d52e72f2-6a58-4a28-bdb4-5778313d7603 | tools-exec-1430 | tools | ACTIVE | - | Running | public=10.68.16.40, 208.80.155.212 |
78| 425f6dcb-277b-4362-a08d-ae6cb357bce7 | tools-exec-1431 | tools | ACTIVE | - | Running | public=10.68.16.151, 208.80.155.211 |
79| 48c68b72-fa7a-48d8-a113-4fc204926d05 | tools-exec-1432 | tools | ACTIVE | - | Running | public=10.68.22.85, 208.80.155.206 |
80| 1b9eebc9-5f23-4c0a-8535-4f1ee99474b6 | tools-exec-1433 | tools | ACTIVE | - | Running | public=10.68.21.151, 208.80.155.201 |
81| f8fe661b-e6a4-4f5d-83c3-00ebc7e90e06 | tools-exec-1434 | tools | ACTIVE | - | Running | public=10.68.23.166, 208.80.155.203 |
82| c5b92f90-be06-4f36-ad06-b4368a59fd4c | tools-exec-1435 | tools | ACTIVE | - | Running | public=10.68.19.160, 208.80.155.216 |
83| 7c6d051b-0262-4e87-bf21-077c0e1c5ab8 | tools-exec-1436 | tools | ACTIVE | - | Running | public=10.68.18.137, 208.80.155.217 |
84| 591fd25e-b3f0-408d-a408-1b243aae79e0 | tools-exec-1437 | tools | ACTIVE | - | Running | public=10.68.19.235, 208.80.155.218 |
85| a1943d2e-e773-40f4-aa34-f6e9d15374cd | tools-exec-1438 | tools | ACTIVE | - | Running | public=10.68.18.208, 208.80.155.220 |
86| 9f9fc853-323e-4ffc-8018-ea1e408b9339 | tools-exec-1439 | tools | ACTIVE | - | Running | public=10.68.18.83, 208.80.155.219 |
87| 5a41a2b1-5bdd-4d52-ba1c-72273b4fe6f3 | tools-exec-1440 | tools | ACTIVE | - | Running | public=10.68.22.236, 208.80.155.215 |
88| 463f2e7e-3a47-4b86-b1bf-5ab9635f94de | tools-exec-1441 | tools | ACTIVE | - | Running | public=10.68.16.126, 208.80.155.214 |
89| 7189d77c-9638-44ee-9d71-48d40171a4e8 | tools-exec-1442 | tools | ACTIVE | - | Running | public=10.68.18.9, 208.80.155.213 |
90| f59fbfa6-a218-43b4-a860-75ca13856357 | tools-k8s-master-01 | tools | ACTIVE | - | Running | public=10.68.17.142, 208.80.155.204, 208.80.155.247 |
91| 8c499e6e-1b79-4bb1-8f7f-72fee1f74ea5 | tools-mail | tools | ACTIVE | - | Running | public=10.68.16.27, 208.80.155.162 |
92| 09c9254e-6d4d-4aaa-a5b6-5d97eab3937d | tools-paws-master-01 | tools | ACTIVE | - | Running | public=10.68.22.164, 208.80.155.224 |
93| df471139-ee28-4c28-8bd0-df8e7f706b59 | tools-proxy-02 | tools | ACTIVE | - | Running | public=10.68.21.81, 208.80.155.131 |
94| 8ce08d7a-f46c-4fd1-9d97-150da885fc0d | tools-static-12 | tools | ACTIVE | - | Running | public=10.68.20.97, 208.80.155.174 |
95| a1a208ad-4b24-4e08-91bc-ce34c09982d5 | tools-worker-1022 | tools | ACTIVE | - | Running | public=10.68.21.130 |
96| f0e4775f-9e40-4a78-b11f-adf0096a1557 | trending | reading-web-staging | ACTIVE | - | Running | public=10.68.22.208 |
97| f3f51b4c-0021-472d-850f-66f69f7bd1f6 | util-abogott | testlabs | ACTIVE | - | Running | public=10.68.19.252, 208.80.155.239 |
98| d56720bc-2bf8-4c2e-a571-a57c6566650a | utrs-production | utrs | ACTIVE | - | Running | public=10.68.23.99, 208.80.155.169 |
99| bdc8775d-a0d4-4c2b-b8b2-f4bb8e7c4e9c | visualeditor-test | visualeditor | ACTIVE | - | Running | public=10.68.22.122, 208.80.155.184 |
100| 7fd2f0e5-c062-44f7-9e43-de8bb0f8acf5 | vmbuilder-trusty | openstack | ACTIVE | - | Running | public=10.68.20.95, 208.80.155.227 |
101| 81ee2e98-5ba1-4262-b3a2-59130a06addb | webservices | getstarted | ACTIVE | - | Running | public=10.68.17.117, 208.80.155.221 |
102| 364abeb7-162c-4899-8700-e9c842f6897e | xmlrcs | huggle | ACTIVE | - | Running | public=10.68.16.54, 208.80.155.196 |
103| fbed895f-ccfe-4403-ba13-5a12183daab9 | xsstest | security-tools | ACTIVE | - | Running | public=10.68.17.178, 208.80.155.231 |
104| c4ac7351-a306-4760-b51b-47faed192cd6 | yandex-proxy01 | yandex-proxy | ACTIVE | - | Running | public=10.68.17.8, 208.80.155.189 |

This is misleading actually as our policy in Toolforge is for execs to have their own IPs and that isn't being honored so we could/should be using up another 20 or so I expect.


Request: /24 range of publicv4 IPs that can used for SNAT/DNAT in the Neutron deployment in eqiad.

Compensating controls: The /25 in use now can be returned to the general pool as soon as we move to the new range (expectation is late calendar year 2018). Our plans are predicated upon being able to have both the existing nova-network and the new Neutron based deployments existing concurrently for comprehensive testing and smooth transition. We could allocate another like /25 and get by for now but my thinking here is a /24 gets 2-3 years down the road at least. Total judgement call there based on current availability.

Event Timeline

chasemp triaged this task as Medium priority.May 1 2018, 2:53 PM
chasemp created this task.

I have previously talked with @ayounsi about this and promised a task weeks ago :) I did assign this but only bc of that and I know @ayounsi is the human who can help the Cloud team sort this out. Thanks man!

@chasemp Can you provide an ETA for returning the /25?

@chasemp Can you provide an ETA for returning the /25?

It's educated guess work but towards the end of calendar year 2018 is realistic. Oct/November would be my best estimate.

ayounsi added a subscriber: faidon.

185.15.56.0/24 is yours pending approval from @faidon

The /25 -> /24 renumbering seems fairly straightforward, but given a) IPv4's depletion (we effectively cannot get more IPv4 space from any of the RIRs), b) the Neutron redesign and c) Cloud Services' growth and needs like T122406's, I think it's worthwhile to look at it a bit more broadly in order to make sure we avoid e.g. depletion or fragmentation of our IP space. Perhaps for instance we need to be looking at a larger assignment :)

So first of:

  • My understanding from the description is that the intention is for this to be used for floating IPs. Do you foresee any other needs in terms of public IPv4 space besides those?
  • Relatedly, do we have any kind of historical growth figures and/or estimates for future growth, in the short-term or mid-term, besides those extra 20 IPs for Toolforge that you mentioned?
  • Is this going to be routed just to eqiad, or does the upcoming addition of the second zone in the next FY means that this is also going to be partially routed to codfw? I'm asking because routing a single /24 to two DCs is possible, but it would cause an impact to availability in case of a split brain between our data centers that may or may not be acceptable?
  • How do we intend to subnet this /24? Is this all going to be flat for floating IPs, or partitioned somehow per data center, row, other kind of availability zone, etc.?

Finally, I think it would be useful to look at this holistically and figure out the addressing plan for {eqiad, codfw} × {IPv4, IPv6} × {public, private} for at least as far as we can reasonably foresee. Do we have a sense for that yet?

The /25 -> /24 renumbering seems fairly straightforward, but given a) IPv4's depletion (we effectively cannot get more IPv4 space from any of the RIRs), b) the Neutron redesign and c) Cloud Services' growth and needs like T122406's, I think it's worthwhile to look at it a bit more broadly in order to make sure we avoid e.g. depletion or fragmentation of our IP space. Perhaps for instance we need to be looking at a larger assignment :)

So first of:

  • My understanding from the description is that the intention is for this to be used for floating IPs. Do you foresee any other needs in terms of public IPv4 space besides those?
  • Relatedly, do we have any kind of historical growth figures and/or estimates for future growth, in the short-term or mid-term, besides those extra 20 IPs for Toolforge that you mentioned?
  • Is this going to be routed just to eqiad, or does the upcoming addition of the second zone in the next FY means that this is also going to be partially routed to codfw? I'm asking because routing a single /24 to two DCs is possible, but it would cause an impact to availability in case of a split brain between our data centers that may or may not be acceptable?
  • How do we intend to subnet this /24? Is this all going to be flat for floating IPs, or partitioned somehow per data center, row, other kind of availability zone, etc.?

Finally, I think it would be useful to look at this holistically and figure out the addressing plan for {eqiad, codfw} × {IPv4, IPv6} × {public, private} for at least as far as we can reasonably foresee. Do we have a sense for that yet?

Thanks man.

(5 responses top-to-bottom)

My understanding from the description is that the intention is for this to be used for floating IPs. Do you foresee any other needs in terms of public IPv4 space besides those?

Yes, all for floating IP assignment (which mirrors the use for the existing /25). The intention/expectation is for this to carry things forward for 3+ years. Based on historic growth this seems realistic. When we get to IPv6 the models I have seen do not support any kind of NAT or IP overlapping and so it's assumed all addresses are public addresses. I have only really stepped through it as far as Newton though.

Relatedly, do we have any kind of historical growth figures and/or estimates for future growth, in the short-term or mid-term, besides those extra 20 IPs for Toolforge that you mentioned?

I do have some growth thinking but it's a lot of adhoc. I've gone through the last year and found 5 instances of floating IP allocation and 2 of those were Tools and Toolsbeta. It's fair to say we burn through about +/-10 a year, which is pretty small for a few reasons: we offer (and push) the general purpose nginx reverse proxy-as-a-service to conserve, we have shared jump bastions for all projects, we encourage Toolforge usage where it's not on the user, and we almost always try to find a way to do things that will not use a floating IP due to it being a very finite resource historically.

I can think of a few use cases coming so my estimate (based on having a decent quota/project request process for about 2 years) is we'll continue allocating 10-15 a year (outside of the 20 backlog). There are a few workarounds to exhaustion in that Toolforge instances can be rebuilt with more resourcing for tighter clustering to reduce floating pool IP usage, etc, but in general we've been very conservative here. The 20 in Toolforge missing now are an anomaly where for a few exec builds it was just forgotten and then we decided to bend our 1:1 rule for external IP (for auditing, rate-limiting, etc) for execs when building out the k8s workers. They were initially all webservices and so it wasn't as big of a deal but this is changing. So to correct that say 20 off the bat and then a very safe and probably liberal thought of 10-15/y. This is pretty rough but in the ballpark.

Is this going to be routed just to eqiad, or does the upcoming addition of the second zone in the next FY means that this is also going to be partially routed to codfw? I'm asking because routing a single /24 to two DCs is possible, but it would cause an impact to availability in case of a split brain between our data centers that may or may not be acceptable?

The way we have talked about it so far internally, regions would be separate address space allocations, and the story of extending individual projects across regions is very muddy and probably not something we can bite off soon. Probably regions would match our site model (i.e. one region for eqiad and one for codfw). The /24 here is expected to be for the existing eqiad deployment only, and the second region thought about for Q3 purchasing in 18-19 would be its own consideration. Is it possible to keep the existing /25 reserved for that potential use case and still allocate the /24 here? That would I believe resource all asks within 36months.

How do we intend to subnet this /24? Is this all going to be flat for floating IPs, or partitioned somehow per data center, row, other kind of availability zone, etc.?

I looked into this a bit more since you asked. At the moment the breakdown on floating IP assignment is:

53 Toolforge
50 Everything else

The intention is for this entire /24 block to be allocated to the eqiad deployment that exists now (or the Neutron version of it that things are being folded into). It may make sense to break it into two /25's with one for Toolforge and one for everything else. Both of these would be the same deployment in eqiad. The intention has been to scale out horizontally with Toolforge as the first use case. (i.e. Toolforge multi-row VXLAN overlay (not there yet), and hopefully Toolforge across regions for HA/scale/etc at some point.) Roughly the idea at the moment is region == site, availability zone == row and each region should be capable of multi-row compute but is still tied to single row for north-south (l3 traffic) making a totally second region attractive.

Finally, I think it would be useful to look at this holistically and figure out the addressing plan for {eqiad, codfw} × {IPv4, IPv6} × {public, private} for at least as far as we can reasonably foresee. Do we have a sense for that yet?

Getting this /24 for the existing /25 would cover IPv4 for the existing deployment for 3 years I expect. Then in addition there is the the planned second region. If we want to future-proof then thinking on another similar block assignment (or holding the /25 there now for that case) makes sense. I'm optimistic about the IPv6 potential here as it's a first class citizen in OpenStack starting around Newton but I'm unsure when/how we would be able to deploy as IPv6 only. @ayounsi and I have been working off of a google sheet where this is being laid out for existing but this does not include thinking on a new region at either site at the moment.

OK, so it looks 185.15.56.0/24 is proposed to be used immediately in eqiad, to replace 208.80.155.128/25 in the next ~6 months. Additionally, 185.15.57.0/24 is proposed to be reserved (but not assigned) to be used tentatively in Q3 FY18-19 in codfw, for a region 2 deployment. Both of these sound good to me and you can proceed :)

@ayounsi, we need to actually create objects for this in RIPE -- it's currently our LIR space, but not assigned to any entity.

It also seems there are some ideas around IPv6 space, but it seems that these can wait until the project progresses further, correct? I'd like to figure out our plan for 2a02:ec80::/32 before we commit to anything here.

The only remaining part that is unclear to me is how the two /24s will be subnet. I'd like for us to think about this through to make sure we don't fragment the IP space and require even more IP space down the line, which unfortunately will be hard to grab :( I think e.g. it may be a good idea to not allocate this space entirely and limit to some subnet, with the rest reserved for "future use". As an example of what I was thinking of: codfw currently has 208.80.153.128/27 as labtest, and it's not clear to me whether the new /24 allocation for region 2 will replace it. Perhaps we should e.g. reserve a /27 from each of eqiad/codfw's /24 for this sort of "labtest" setup?

https://apps.db.ripe.net/db-web-ui/#/lookup?source=ripe&key=185.15.56.0%2F24AS14907&type=route created.

IPv6 is tracked in T187929 and can indeed wait.

Maybe something like the following, but just a suggestion, I don't want to step on @chasemp's toes

185.15.56.0/25   - 1:1 replacement of 208.80.155.128/25
185.15.56.128/26 - reserved for groth beyond first /25
185.15.56.192/27 - reserved for future use
185.15.56.224/27 - reserved for labtest
Vvjjkkii renamed this task from Allocate public v4 IPs for Neutron setup in eqiad to fwdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii removed faidon as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: Aklapper, gerritbot.
CommunityTechBot renamed this task from fwdaaaaaaa to Allocate public v4 IPs for Neutron setup in eqiad.Jul 2 2018, 2:55 PM
CommunityTechBot assigned this task to faidon.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot edited projects, added Cloud-Services; removed Hashtags.
CommunityTechBot added subscribers: Aklapper, gerritbot.

https://apps.db.ripe.net/db-web-ui/#/lookup?source=ripe&key=185.15.56.0%2F24AS14907&type=route created.

185.15.56.0/25   - 1:1 replacement of 208.80.155.128/25
185.15.56.128/26 - reserved for groth beyond first /25
185.15.56.192/27 - reserved for future use
185.15.56.224/27 - reserved for labtest

Nice, I don't anticipate we will need a labtest out of this 185.15.56.0/24 block as the deployment in codfw is enough (ping @Andrew on my reasoning there).

So seems like for Eqiad:

185.15.56.0/25     - 1:1 replacement of 208.80.155.128/25
185.15.56.128/26   - reserved for growth beyond first /25
185.15.56.192/26   - reserved for future use

The cloud address allocation spreadsheet was mostly updated but I added a bit to clarify (I hope) re: labtest.

We do want to keep labtest and the 208.80.153.128/27 - Labtest being entirely adhoc right now seems better assigned out of the codfw /24 when that range becomes assigned and not only reserved. I updated the doc to be clear at least

Perhaps we should e.g. reserve a /27 from each of eqiad/codfw's /24 for this sort of "labtest" setup?

So yes, but we only need it from the codfw range. If we want to mark that allocated and not just reserved we can give back the 208.80.153.128/27 for labtest now:

In that case Codfw looks like:

185.15.57.0/25     - OpenStack Region2
185.15.57.128/26   - reserved for growth beyond first /25
185.15.57.224/27   - future use
185.15.57.192/27   - labtest

Based on the approvals above I'm moving forward with the idea that 185.15.56.0/24 is good to go with some details on breaking it down to be agreed on. Maybe what I've got here will suffice :)

+    /* Cloud public prefix via labnet100[45] */
+    route 185.15.56.0/25 next-hop 10.64.22.4;
+    /* Cloud public prefix via labnet100[45] */
+    route 185.15.56.0/25 next-hop 10.64.22.4;

@ayounsi if https://phabricator.wikimedia.org/T193496#4417390 seems sane to you could you possibly advertise 185.15.56.0/24 to the world when you get a chance? Thanks

(@Andrew is working on the DNS portion for this in T199374)

Mentioned in SAL (#wikimedia-operations) [2018-07-12T21:12:04Z] <XioNoX> advertising 185.15.56.0/24 to the DFZ - T193496

For the record, with the migration away from and shutdown of the nova-network 'main' region, the 208.80.155.128/25 range is no longer in use. Andrew approved the commit to remove the relevant reverse DNS delegation records (whose commit message also includes a statement to this effect) at https://gerrit.wikimedia.org/r/#/c/operations/dns/+/505478/.
I'm not sure if anything else needs doing in this ticket or if it can be resolved?

Edit: it routed to labnet1001 which has just been decommed in T221818

Mentioned in SAL (#wikimedia-operations) [2019-05-03T00:10:05Z] <XioNoX> remove static route to 208.80.155.128/25 on cr1/2-eqiad - T193496

208.80.155.128/25 has been cleaned up from Netbox and DNS (see above).

Change 507907 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] network: remove old labs public range

https://gerrit.wikimedia.org/r/507907

Change 507907 merged by Alexandros Kosiaris:
[operations/puppet@production] network: remove old labs public range

https://gerrit.wikimedia.org/r/507907

I was 6 months off on my estimate for this :)