There seems to be some kind of IPv6 issue. The default IPv6 route is not correctly installed on cloudgw devices.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T209460 CloudVPS: network architecture | |||
Resolved | aborrero | T270704 cloud: introduce new edge network architecture for eqiad1 and codfw1dev | |||
Resolved | aborrero | T277287 cloudgw: IPv6 issue in the control plane network |
Event Timeline
Change 672360 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: separate dataplane network configuration into a different file
Change 672360 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: separate dataplane network configuration into a different file
Change 672364 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: add standard mapped IPv6 addresses to the primary interfaces
Change 672364 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: add standard mapped IPv6 addresses to the primary interfaces
Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:
cloudgw2002-dev.codfw.wmnet
The log can be found in /var/log/wmf-auto-reimage/202103151113_aborrero_5420_cloudgw2002-dev_codfw_wmnet.log.
Completed auto-reimage of hosts:
['cloudgw2002-dev.codfw.wmnet']
Of which those FAILED:
['cloudgw2002-dev.codfw.wmnet']
Change 672379 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] interface: add_ip6_mapped: ignore errors setting IPv6 token
Change 672382 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudgw: adjust sysctl parameters that are only meant for dataplane
Change 672379 abandoned by Arturo Borrero Gonzalez:
[operations/puppet@production] interface: add_ip6_mapped: ignore errors setting IPv6 token
Reason:
Real issue solved on https://gerrit.wikimedia.org/r/c/operations/puppet/ /672382
Change 672382 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudgw: adjust sysctl parameters that are only meant for dataplane
This was a bad interaction between sysctl parameters and IPv6 configuration.
Per kernel source code, when IPv6 token is configured, forwarding and accept_ra settings are checked for sanity:
root@cloudgw2002-dev:~# /sbin/ip -6 token set ::10:192:20:18 dev eno1 RTNETLINK answers: Invalid argument root@cloudgw2002-dev:~# sysctl -a | grep accept_ra [..] net.ipv6.conf.eno1.accept_ra = 1 root@cloudgw2002-dev:~# sysctl net.ipv6.conf.eno1.accept_ra=2 net.ipv6.conf.eno1.accept_ra = 2 root@cloudgw2002-dev:~# /sbin/ip token set ::10:192:20:18 dev eno1
We don't need (or want) network forwarding on the control plane interface, so the fix in this case was to adjust sysctl parameters to only include forwarding (and others) for data plane interfaces.
Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:
cloudgw2002-dev.codfw.wmnet
The log can be found in /var/log/wmf-auto-reimage/202103151458_aborrero_14130_cloudgw2002-dev_codfw_wmnet.log.
Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:
cloudgw2001-dev.codfw.wmnet
The log can be found in /var/log/wmf-auto-reimage/202103151459_aborrero_14565_cloudgw2001-dev_codfw_wmnet.log.
Completed auto-reimage of hosts:
['cloudgw2002-dev.codfw.wmnet']
and were ALL successful.
Completed auto-reimage of hosts:
['cloudgw2001-dev.codfw.wmnet']
and were ALL successful.
Change 836732 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: neutron: l3_agent: don't sysctl base interface
Change 836732 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: neutron: l3_agent: don't sysctl base interface