Page MenuHomePhabricator

openstack: network problems when introducing new networks
Closed, ResolvedPublic

Description

We detected some network problems when introducing the new dualstack and ipv4-only networks.

This ticket is to track the work to identify and fix them.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
codfw1dev: flavors: fix project access for g4.cores1.ram1.disk4repos/cloud/cloud-vps/tofu-infra!171aborreroarturo-322-codfw1dev-flavors-fmain
codfw1dev: create network tests instancesrepos/cloud/cloud-vps/tofu-infra!170aborreroarturo-276-codfw1dev-create-nemain
codfw1dev: add vxlan-ipv4-only.cloudinstances2b-gw.svc.codfw1dev.wikimedia.cloud FQDNrepos/cloud/cloud-vps/tofu-infra!169aborreroarturo-169-codfw1dev-add-vxlanmain
codfw1dev: tools-codfw1dev: manage default security grouprepos/cloud/cloud-vps/tofu-infra!140aborreroarturo-118-codfw1dev-tools-codmain
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
OpenNone
DeclinedNone
OpenNone
OpenNone
OpenNone
Resolvedtaavi
OpenNone
Resolved aborrero
OpenNone
Resolvedtaavi
Resolvedtaavi
OpenNone
Resolved aborrero
Resolved aborrero
Resolved aborrero
Resolvedcmooney
Resolved aborrero
Resolved aborrero
Resolved aborrero
Resolved aborrero
Resolved aborrero
Resolvedfnegri

Event Timeline

aborrero changed the task status from Open to In Progress.
aborrero triaged this task as Medium priority.
aborrero moved this task from Backlog to Doing on the User-aborrero board.
aborrero renamed this task from openstack: network problems when introducing new dualstack and ipv4-only networks to openstack: network problems when introducing new networks.Nov 25 2024, 11:49 AM

Change #1097370 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: networktests: refresh for latest network changes

https://gerrit.wikimedia.org/r/1097370

Change #1097370 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: networktests: refresh for latest network changes

https://gerrit.wikimedia.org/r/1097370

I detected a few inconsistencies in the network testing scripts, I will fix them.

Among others, I will use the vlanX120.cloudgwYYYY.<deploy>.wikimediacloud.org scheme for the IP addresses, like we do for vlan1107/vlan2107.

Change #1097380 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw: use vlan1120/vlan2120 prefix for FQDN

https://gerrit.wikimedia.org/r/1097380

Change #1097380 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw: use vlan1120/vlan2120 prefix for FQDN

https://gerrit.wikimedia.org/r/1097380

Change #1097440 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: networktests: support IPv6 and IPv4-only networks

https://gerrit.wikimedia.org/r/1097440

today @cmooney reported this was maybe caused by some inconsistency on the edge routing configuration for cloudsw devices.

aborrero changed the task status from In Progress to Open.Mar 4 2025, 11:25 AM

I'd forgot about this task, apologies.

The reason the problems occurred last time with this is the cloud switches had not yet been configured with IPv6 addressing on the various host-facing vlans, nor on their interconnects or link to the WMF core routers. Which basically meant IPv6 routing in and out of the cloud network was not functioning (indeed not even configured), and the cloud systems were firing traffic into a black hole.

We're working to add the IPv6 interfaces, routing protocols, public BGP announcements for assigned ranges etc now, after which we should be able to re-attempt enabling IPv6 on the host side.

The only caveat to that is I don't think any IPv4 networks should have been affected by the lack of v6 configuration on the physical infra, so perhaps there is some other issue here. Either way we should ensure the v6 infra is fully configured before we re-attempt.

Change #1097440 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: networktests: support IPv6 and IPv4-only networks

https://gerrit.wikimedia.org/r/1097440

aborrero opened https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/169

codfw1dev: add vxlan-ipv4-only.cloudinstances2b-gw.svc.codfw1dev.wikimedia.cloud FQDN

aborrero merged https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/169

codfw1dev: add vxlan-ipv4-only.cloudinstances2b-gw.svc.codfw1dev.wikimedia.cloud FQDN

Mentioned in SAL (#wikimedia-cloud) [2025-04-07T15:30:30Z] <arturo> create a bunch of VMs by hand, like networktests-vlan-legacy-floating T380728

Mentioned in SAL (#wikimedia-cloud) [2025-04-07T15:30:48Z] <arturo> [codfw1dev] testlabs create a bunch of VMs by hand, like networktests-vlan-legacy-floating T380728

aborrero claimed this task.

we think all problems have been addresses. Among other things: