Page MenuHomePhabricator

cloudgw: IPv6 issue in the control plane network
Closed, ResolvedPublic

Description

There seems to be some kind of IPv6 issue. The default IPv6 route is not correctly installed on cloudgw devices.

Event Timeline

aborrero triaged this task as Medium priority.Mar 12 2021, 12:53 PM
aborrero created this task.

Change 672360 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: separate dataplane network configuration into a different file

https://gerrit.wikimedia.org/r/672360

Change 672360 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: separate dataplane network configuration into a different file

https://gerrit.wikimedia.org/r/672360

Change 672364 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: add standard mapped IPv6 addresses to the primary interfaces

https://gerrit.wikimedia.org/r/672364

Change 672364 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: add standard mapped IPv6 addresses to the primary interfaces

https://gerrit.wikimedia.org/r/672364

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2002-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103151113_aborrero_5420_cloudgw2002-dev_codfw_wmnet.log.

Completed auto-reimage of hosts:

['cloudgw2002-dev.codfw.wmnet']

Of which those FAILED:

['cloudgw2002-dev.codfw.wmnet']

Change 672379 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] interface: add_ip6_mapped: ignore errors setting IPv6 token

https://gerrit.wikimedia.org/r/672379

Change 672382 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: adjust sysctl parameters that are only meant for dataplane

https://gerrit.wikimedia.org/r/672382

Change 672379 abandoned by Arturo Borrero Gonzalez:
[operations/puppet@production] interface: add_ip6_mapped: ignore errors setting IPv6 token

Reason:
Real issue solved on https://gerrit.wikimedia.org/r/c/operations/puppet/ /672382

https://gerrit.wikimedia.org/r/672379

Change 672382 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: adjust sysctl parameters that are only meant for dataplane

https://gerrit.wikimedia.org/r/672382

aborrero closed this task as Resolved.EditedMar 15 2021, 12:51 PM

This was a bad interaction between sysctl parameters and IPv6 configuration.

Per kernel source code, when IPv6 token is configured, forwarding and accept_ra settings are checked for sanity:

image.png (306×770 px, 56 KB)

image.png (156×753 px, 32 KB)

root@cloudgw2002-dev:~# /sbin/ip -6 token set ::10:192:20:18 dev eno1
RTNETLINK answers: Invalid argument
root@cloudgw2002-dev:~# sysctl -a | grep accept_ra
[..]
net.ipv6.conf.eno1.accept_ra = 1
root@cloudgw2002-dev:~# sysctl net.ipv6.conf.eno1.accept_ra=2
net.ipv6.conf.eno1.accept_ra = 2
root@cloudgw2002-dev:~# /sbin/ip token set ::10:192:20:18 dev eno1

We don't need (or want) network forwarding on the control plane interface, so the fix in this case was to adjust sysctl parameters to only include forwarding (and others) for data plane interfaces.

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2002-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103151458_aborrero_14130_cloudgw2002-dev_codfw_wmnet.log.

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

cloudgw2001-dev.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103151459_aborrero_14565_cloudgw2001-dev_codfw_wmnet.log.

Completed auto-reimage of hosts:

['cloudgw2002-dev.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['cloudgw2001-dev.codfw.wmnet']

and were ALL successful.

Change 836732 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: neutron: l3_agent: don't sysctl base interface

https://gerrit.wikimedia.org/r/836732

Change 836732 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: neutron: l3_agent: don't sysctl base interface

https://gerrit.wikimedia.org/r/836732