Page MenuHomePhabricator

CloudVPS: IPv6 in codfw1dev
Closed, ResolvedPublic

Description

This ticket is to track the work to enable IPv6 in codfw1dev prior to doing in eqiad1.

  • IPv6 addressing -- T187929 T374712
  • edge networking (cloudsw, cloudgw, etc) -- T374713 T374716
  • virtual network support (neutron, subnets, ports, etc)
  • other openstack bits (horizon, keystone hooks, etc) -- T377339
  • security groups -- T374714
  • DNS AAAA and PTR records integration -- T374715

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
StalledNone
OpenNone
Stalledaborrero
OpenNone
Opentaavi
StalledNone
ResolvedNone
ResolvedNone
OpenNone
Opentaavi
OpenNone
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedcmooney
Resolvedaborrero
Resolvedaborrero

Event Timeline

bd808 triaged this task as Medium priority.Feb 25 2020, 5:05 PM
bd808 moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

The steps I have in mind here are:
1/ Setup v6 on the transport network (most likely 2620:0:860:fe0a::/64)
2/ Assign a /48 for cloud codfw, see T187929 (Here we won't go into its subnetting)
3/ Advertise that /48 to the world either from codfw only or both eqiad/codfw(TBD, former is preferred)
3a/ Shrink esams advertisement from 2a02:ec80::/32 to 2a02:ec80:500::/48
4/ Setup a basic firewall filter (cloud-in6)
My suggestion is to initially only allow traffic to the internet and maybe some selected "private" hosts needed for the testing. This in order to minimize the workload of creating and managing that filter during and after the POC.

5/ Statically route the whole /48 to the neutron gateway VIP (go live)
This would stay like this for simplicity for now, but will most likely change down the road (if for example support network gets moved to that prefix but on a different network)

I assume that if the PoC is not successful, we would rollback the configuration and assignments will stay as reserved for future usage.
If successful, we would need to start managing the firewall filter (like we currently do with v4) does that mean the v4 exceptions will go away but be replaced by the v6 ones or both will stay live?
Is there anything else that will change on that aspect if we keep maintaining v6 on the long run?

Mostly agree with everything, some inlined comments

The steps I have in mind here are:
1/ Setup v6 on the transport network (most likely 2620:0:860:fe0a::/64)

Is that range definitive? If no, could we pick one that would likely be used in the future too? Per my comment T187929#5984401 I suggest we use 2a02:ec80:1:0::/64.

2/ Assign a /48 for cloud codfw, see T187929 (Here we won't go into its subnetting)

OK. Same, I suggest we use 2a02:ec80:1:1::/64 (see T187929#5984401)

3/ Advertise that /48 to the world either from codfw only or both eqiad/codfw(TBD, former is preferred)
4/ Setup a basic firewall filter (cloud-in6)
My suggestion is to initially only allow traffic to the internet and maybe some selected "private" hosts needed for the testing. This in order to minimize the workload of creating and managing that filter during and after the POC.

I agree. I suggest we do this firewalling incrementally. Ideally we would implement some statefull firewalling and only allow connections started within the cloud, but I'm aware this is not possible using the prod core routers as upstream.
For the initial PoC, we only need 3 addresses allowed:

  • 2620:0:860:3:208:80:153:76 (cloudservices2002-dev.wikimedia.org)
  • 2620:0:860:2:208:80:153:59 (cloudcontrol2001-dev.wikimedia.org)
  • 2620:0:860:3:208:80:153:75 (cloudcontrol2003-dev.wikimedia.org)

i.e, connection to/from the the cloud to these physical hosts.

5/ Statically route the whole /48 to the neutron gateway VIP (go live)
This would stay like this for simplicity for now, but will most likely change down the road (if for example support network gets moved to that prefix but on a different network)

I assume that if the PoC is not successful, we would rollback the configuration and assignments will stay as reserved for future usage.

Ok, That's why I think it may be important to agree on the addressing plan first: T187929: Cloud IPv6 subnets

If successful, we would need to start managing the firewall filter (like we currently do with v4) does that mean the v4 exceptions will go away but be replaced by the v6 ones or both will stay live?

Initially both will stay alive. I'm not sure what would be our next step from the network point of view:

  • we get rid of our NAT dmz_cidr mechanism --> v4 firewall need adjustment in the prod core routers.
  • we introduce a pair of cloud upstream routers/firewalls between neutron and the prod core routers --> we can drop all the firewalling from the prod core routers.

Is there anything else that will change on that aspect if we keep maintaining v6 on the long run?

Other than what I commented already, I don't have any other concern as of right now. Things may change in the near future if we really start playing with the PoC and we start learning how this looks like for real.

aborrero renamed this task from CloudVPS: IPv6 early PoC to CloudVPS: IPv6 in codfw1dev.Sep 13 2024, 12:54 PM
aborrero updated the task description. (Show Details)
aborrero updated the task description. (Show Details)
aborrero updated the task description. (Show Details)
aborrero updated the task description. (Show Details)
aborrero added a project: User-aborrero.
aborrero moved this task from Backlog to Next on the User-aborrero board.

Change #1078990 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add elements for WMCS IPv6 range in codfw 2a02:ec80:a100::/48

https://gerrit.wikimedia.org/r/1078990

Change #1078990 merged by jenkins-bot:

[operations/homer/public@master] Add elements for WMCS IPv6 range in codfw 2a02:ec80:a100::/48

https://gerrit.wikimedia.org/r/1078990

Change #1079237 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Adjust cloudsw/cr bgp policies and include new IPv6 range for codfw

https://gerrit.wikimedia.org/r/1079237

Change #1079237 merged by jenkins-bot:

[operations/homer/public@master] Adjust cloudsw/cr bgp policies and include new IPv6 range for codfw

https://gerrit.wikimedia.org/r/1079237

Change #1079252 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Fix typos in updated prefix-list for cloud ranges eqiad

https://gerrit.wikimedia.org/r/1079252

Change #1079252 merged by jenkins-bot:

[operations/homer/public@master] Fix typos in updated prefix-list for cloud ranges eqiad

https://gerrit.wikimedia.org/r/1079252

Change #1079254 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Fix similar typo in the codfw policy

https://gerrit.wikimedia.org/r/1079254

Change #1079254 merged by jenkins-bot:

[operations/homer/public@master] Fix similar typo in the codfw policy

https://gerrit.wikimedia.org/r/1079254

The edge (cloudsw/cr) networking is now complete, elements in the range are reachable externally.

cathal@officepc:~$ mtr -z -b -w -c 5 2a02:ec80:a100:fe03::1 
Start: 2024-10-10T14:35:25+0100
HOST: officepc                                                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS5466   pool-ipv6-pd.agg1.srl.blp-srl.eir.ie (2001:bb6:8b70:9e00::1)        0.0%     5    0.4   0.4   0.3   0.5   0.1
  2. AS5466   agg1.srl.blp-srl.eircom.net (2001:bb0:6:a11d::1)                    0.0%     5    8.2   5.9   4.6   8.2   1.7
  3. AS5466   2001:bb0:6:a197::1                                                  0.0%     5    4.7   4.8   4.7   4.9   0.1
  4. AS1299   dln-b3-link.ip.twelve99.net (2001:2035:0:9eb::1)                    0.0%     5    4.7   4.7   4.4   4.8   0.1
  5. AS1299   ldn-bb2-v6.ip.twelve99.net (2001:2034:1:ca::1)                      0.0%     5   14.7  14.8  14.6  15.0   0.1
  6. AS1299   prs-bb2-v6.ip.twelve99.net (2001:2034:1:c1::1)                      0.0%     5   32.5  32.8  32.5  33.0   0.2
  7. AS1299   rest-bb1-v6.ip.twelve99.net (2001:2034:1:73::1)                    20.0%     5   98.2  98.1  97.9  98.2   0.1
  8. AS1299   atl-bb1-v6.ip.twelve99.net (2001:2034:1:a1::1)                      0.0%     5  115.0 114.6 114.2 115.0   0.3
  9. AS???    ???                                                                100.0     5    0.0   0.0   0.0   0.0   0.0
 10. AS???    ???                                                                100.0     5    0.0   0.0   0.0   0.0   0.0
 11. AS???    irb-1120.cloudsw1-b1-codfw.wikimedia.org (2a02:ec80:a100:fe03::1)   0.0%     5  132.7 132.8 132.7 133.0   0.2
cmooney@cloudgw2002-dev:~$ sudo tcpdump -i vlan2120 -l -p -nn host 2001:bb6:8b70:9e00::187
listening on vlan2120, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:51:35.790703 IP6 2001:bb6:8b70:9e00::187 > 2a02:ec80:a100:1::29c: ICMP6, echo request, id 6205, seq 97, length 64
13:51:36.814913 IP6 2001:bb6:8b70:9e00::187 > 2a02:ec80:a100:1::29c: ICMP6, echo request, id 6205, seq 98, length 64

Change #1079288 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add orlonger to policy on announced v6 routes from cloudsw

https://gerrit.wikimedia.org/r/1079288

Change #1079288 abandoned by Cathal Mooney:

[operations/homer/public@master] Add orlonger to policy on announced v6 routes from cloudsw

https://gerrit.wikimedia.org/r/1079288

Change #1079982 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Remove WMCS codfw prefix from CR aggregate conf and adjust outfilter

https://gerrit.wikimedia.org/r/1079982

Change #1079982 merged by jenkins-bot:

[operations/homer/public@master] Remove WMCS codfw prefix from CR aggregate conf and adjust outfilter

https://gerrit.wikimedia.org/r/1079982

aborrero claimed this task.
aborrero updated the task description. (Show Details)

I think we can consider IPv6 to be fully working on codfw1dev.