Page MenuHomePhabricator

prometheus5002 unable to ping ipv6 ganeti500[74] eqsin
Closed, ResolvedPublic

Description

Followup from T163996: Probe for ipv6 host reachability, asking to do "smoke" probes using v6 uncovered the fact that prometheus5002 can't ping (or tcp) ganeti over v6:

prometheus5002:~$ ping -c3 ganeti5007
PING ganeti5007(ganeti5007.eqsin.wmnet (2001:df2:e500:101:10:132:0:11)) 56 data bytes
From prometheus5002.eqsin.wmnet (2001:df2:e500:101:10:132:0:12) icmp_seq=1 Destination unreachable: Address unreachable
From prometheus5002.eqsin.wmnet (2001:df2:e500:101:10:132:0:12) icmp_seq=2 Destination unreachable: Address unreachable
From prometheus5002.eqsin.wmnet (2001:df2:e500:101:10:132:0:12) icmp_seq=3 Destination unreachable: Address unreachable

--- ganeti5007 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2010ms

prometheus5002:~$ ping -c3 ganeti5004
PING ganeti5004(ganeti5004.eqsin.wmnet (2001:df2:e500:101:10:132:0:40)) 56 data bytes
From prometheus5002.eqsin.wmnet (2001:df2:e500:101:10:132:0:12) icmp_seq=1 Destination unreachable: Address unreachable
From prometheus5002.eqsin.wmnet (2001:df2:e500:101:10:132:0:12) icmp_seq=2 Destination unreachable: Address unreachable
From prometheus5002.eqsin.wmnet (2001:df2:e500:101:10:132:0:12) icmp_seq=3 Destination unreachable: Address unreachable

--- ganeti5004 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2003ms

The bastion for example is fine

prometheus5002:~$ ping6 -c3 bast5004
PING bast5004(bast5004.wikimedia.org (2001:df2:e500:1:103:102:166:6)) 56 data bytes
64 bytes from bast5004.wikimedia.org (2001:df2:e500:1:103:102:166:6): icmp_seq=1 ttl=63 time=0.656 ms
64 bytes from bast5004.wikimedia.org (2001:df2:e500:1:103:102:166:6): icmp_seq=2 ttl=63 time=0.732 ms
64 bytes from bast5004.wikimedia.org (2001:df2:e500:1:103:102:166:6): icmp_seq=3 ttl=63 time=0.798 ms

--- bast5004 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.656/0.728/0.798/0.058 ms

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks for finding the issue! The host lost its IP in favor of a SLAAC IP

ganeti5007:~$ ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
6: private: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2001:df2:e500:101:58c4:bff:fe4b:2d87/64 scope global dynamic mngtmpaddr 
       valid_lft 2591990sec preferred_lft 604790sec
    inet6 fe80::58c4:bff:fe4b:2d87/64 scope link 
       valid_lft forever preferred_lft forever
[...]

Looking at /etc/network/interfaces iface private inet static is missing the v6 up commands.
For example that's what we have in ulsfo:

up /sbin/ip token set ::10:128:0:8 dev private
up ip addr add 2620:0:863:101:10:128:0:8/64 dev private

Draining ganeti5007.eqsin.wmnet of running VMs

Draining ganeti5006.eqsin.wmnet of running VMs

Draining ganeti5005.eqsin.wmnet of running VMs

Draining ganeti5004.eqsin.wmnet of running VMs

MoritzMuehlenhoff claimed this task.

This has been fixed, please reopen if you run into other network issues with the eqsin Ganeti cluster.