Page MenuHomePhabricator

Cloud IPv6 subnets
Open, MediumPublic

Description

Follow up from T184209

Looking at codfw (but it's similar in eqiad)

We currently use the following IPv6 for the labs cloud ranges:
ae2.2122 - labs-support1-b-codfw - 2620:0:860:122::/64
ae2.2118 - labs-hosts1-b-codfw - 2620:0:860:118::/64
ae2.2120 - labs-instance-transport1-b-codfw - 2620:0:860:120::/64

The reason was probably including part of the vlan ID in the IP.
But this falls into the larger subnet 2620:0:860:100::/56 - codfw private

It's not an issue right now, especially as cloud doesn't use much IPv6, but might be an issue in the future.

I see 2 options:
1/ use a different /56
For example:

2620:0:860:200::/56  - labs-codfw
2620:0:861:200::/56  - labs-eqiad

2/ use dedicated /48s

2a02:ec80:0::/44 - labs (16 * /48) (can be shrinked to a /45)
    2a02:ec80:0::/48 - labs eqiad
        XXXX
    2a02:ec80:1::/48 - labs codfw
        2a02:ec80:1:2122::/64 - 2122 - labs-support1-b-codfw  (84A)
        2a02:ec80:1:2118::/64 - 2118 - labs-hosts1-b-codfw  (846)
        2a02:ec80:1:2120::/64 - 2120 - labs-instance-transport1-b-codfw  (848)

Having the vlanID in decimal in the IP makes it easier to understand, but we can also use the hex value (2122->84A) so it's more accurate.

1/ is more of a short term solution while 2/ will require more work (advertise new /48s to the world) but is the most sustainable option.

Event Timeline

ayounsi triaged this task as Medium priority.Feb 21 2018, 7:21 PM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptFeb 21 2018, 7:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I like option 2) the most. Are those ranges actual data?

Regarding coding the vlan id: I don't think we should do it. We might eventually move away from the prod VLAN thing, or have addresses where the VLAN part is meaningless (think of virtual networking inside the cloud itself, like VMs or virtual routers).

Will all this in mind, I suggest we use this addressing plan

2a02:ec80:0::/44 - cloud (16 * /48)
    2a02:ec80:0::/48 - cloud eqiad1 (16 * /64)
        2a02:ec80:0:0::/64 - cloud-physical-eqiad1 -- includes physical transport networks (may have more than one), physical servers, virtual IP addresses for physical servers, and whatever we may need that correspond to physical hardware
            2a02:ec80:0:0:0::/80 - cloud-upstream1-eqiad1 -- physical connectivity between our external physical router and the prod core routers (example of thing that might happen sooner than later)
            2a02:ec80:0:0:1::/80 - cloud-transport1-eqiad1 -- physical connectivity between neutron and our external physical routers
            2a02:ec80:0:0:2::/80 - cloud-hosts1-eqiad1 -- physical connectivity for servers and supporting services, a subnet connected to our external physical router
        2a02:ec80:0:1::/64 - cloud-virtual-eqiad1  -- everything from neutron virtual routers to VMs, including virtual addresses inside openstack and other virtual services.
    2a02:ec80:1::/48 - cloud codfw1dev
        2a02:ec80:1:0::/64 - cloud-physical-codfw1dev -- (see eqiad1 equivalent)
        2a02:ec80:1:1::/64 - cloud-virtual-codfw1dev  -- (see eqiad1 equivalent)
ayounsi assigned this task to faidon.Mar 19 2020, 6:02 PM

I agree that option 2 is the way to go.

The complication is how to subnet them properly for both the short term (T245495 PoC) and the longer term. I couldn't find much subnetting recommendation doc in my little research.
While keeping in mind v6 subnetting convention (eg. nothing smaller than a /64).

For example we take eqiad's:

2a02:ec80:0::/48
    2a02:ec80:0::/49
        2a02:ec80::/56 - infrastructure and support networks (gives 256*/64)
        2a02:ec80:0:100::/56 - virtual networks (gives 256*/64)
            2a02:ec80:0:100::/64 - eg VMs flat network (similar to the the 172.16.0.0/21network)
    2a02:ec80:0:8000::/49 - reserved for future use

Which is very similar to your proposal, but with different mask lengths.

For now the first one would not be used (afaik) but will be if we move to a model where the whole cloud infra is behind its dedicated gear.
https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Network_refresh#intermediate_router/firewall

Re-assigning to @faidon for approval as we're talking about long time design and a lots of IPs (see also T245495 for context)