Event Timeline
DOWN on storage network:
cmooney@cloudsw1-d5-eqiad> ping routing-instance cloud 192.168.4.19
PING 192.168.4.19 (192.168.4.19): 56 data bytes
^C
- 192.168.4.19 ping statistics ---
Links physically up:
cmooney@cloudsw1-d5-eqiad> show interfaces descriptions | match 1019
xe-0/0/2 up up cloudcephosd1019 {#5350}
xe-0/0/3 up up cloudcephosd1019 {#0240}
False alert, works if correct source is specified (rahter than using junos loopback which is default)
cmooney@cloudsw1-d5-eqiad> ping routing-instance cloud 192.168.4.19 source 192.168.4.253
PING 192.168.4.19 (192.168.4.19): 56 data bytes
64 bytes from 192.168.4.19: icmp_seq=0 ttl=64 time=0.824 ms
64 bytes from 192.168.4.19: icmp_seq=1 ttl=64 time=11.326 ms
64 bytes from 192.168.4.19: icmp_seq=2 ttl=64 time=4.344 ms
Ping from cloudsw1-e4 to cloudcephosd1019 seems to be ok:
cmooney@cloudsw1-e4-eqiad> ping 10.64.20.16 source 10.64.148.1
PING 10.64.20.16 (10.64.20.16): 56 data bytes
64 bytes from 10.64.20.16: icmp_seq=0 ttl=63 time=0.610 ms
64 bytes from 10.64.20.16: icmp_seq=1 ttl=63 time=0.566 ms
64 bytes from 10.64.20.16: icmp_seq=2 ttl=63 time=0.476 ms
Ping from cloudsw1-e4 to cloudcephosd1025 seems ok too:
cmooney@cloudsw1-e4-eqiad> ping 10.64.148.2 source 10.64.148.1
PING 10.64.148.2 (10.64.148.2): 56 data bytes
64 bytes from 10.64.148.2: icmp_seq=0 ttl=64 time=0.651 ms
64 bytes from 10.64.148.2: icmp_seq=1 ttl=64 time=0.495 ms
64 bytes from 10.64.148.2: icmp_seq=2 ttl=64 time=0.372 ms
And with Jumbos:
cmooney@cloudsw1-e4-eqiad> ping 10.64.20.16 source 10.64.148.1 size 9000 do-not-fragment
PING 10.64.20.16 (10.64.20.16): 9000 data bytes
9008 bytes from 10.64.20.16: icmp_seq=0 ttl=63 time=1.857 ms
9008 bytes from 10.64.20.16: icmp_seq=1 ttl=63 time=2.742 ms
9008 bytes from 10.64.20.16: icmp_seq=2 ttl=63 time=1.808 ms
cmooney@cloudsw1-e4-eqiad> show interfaces descriptions | match cloudceph
xe-0/0/18 up down cloudcephosd1001
xe-0/0/19 up down cloudcephosd1001
xe-0/0/20 up up cloudcephosd1025 {#20220102}
xe-0/0/21 up up cloudcephosd1025 {#20220105}
xe-0/0/22 up up cloudcephosd1026 {#20220103}
xe-0/0/23 up up cloudcephosd1026 {#20220107}
xe-0/0/24 up up cloudcephosd1027 {#20220100}
xe-0/0/25 up up cloudcephosd1027 {#20220106}
xe-0/0/26 up up cloudcephosd1028 {#20220101}
xe-0/0/27 up up cloudcephosd1028 {#20220110}
xe-0/0/28 up up cloudcephosd1029 {#20220104}
xe-0/0/29 up up cloudcephosd1029 {#20220108}
Homer change on E4 for reference:
cmooney@cumin1001:~$ homer cloudsw1-e4-eqiad* commit "Enable ports connected to moved server cloudcephosd1001" INFO:homer.devices:Initialized 53 devices INFO:homer:Committing config for query cloudsw1-e4-eqiad* with message: Enable ports connected to moved server cloudcephosd1001 INFO:homer:Gathering global Netbox data INFO:homer.devices:Matched 1 device(s) for query 'cloudsw1-e4-eqiad*' INFO:homer:Generating configuration for cloudsw1-e4-eqiad.mgmt.eqiad.wmnet WARNING:homer.capirca:Netbox capirca.GetHosts script is > 3 days old. Configuration diff for cloudsw1-e4-eqiad.mgmt.eqiad.wmnet: [edit interfaces] + xe-0/0/16 { + description DISABLED; + disable; + } + xe-0/0/17 { + description DISABLED; + disable; + } + xe-0/0/18 { + description cloudcephosd1001; + unit 0 { + family ethernet-switching { + interface-mode access; + vlan { + members cloud-hosts1-e4-eqiad; + } + } + } + } + xe-0/0/19 { + description cloudcephosd1001; + unit 0 { + family ethernet-switching { + interface-mode access; + vlan { + members cloud-storage1-e4-eqiad; + } + } + } + } Type "yes" to commit, "no" to abort. > yes INFO:homer.transports.junos:Committing the configuration on cloudsw1-e4-eqiad.mgmt.eqiad.wmnet INFO:homer:Homer run completed successfully on 1 devices: ['cloudsw1-e4-eqiad.mgmt.eqiad.wmnet']
Jumbo pings working across racks ok:
cmooney@cloudcephosd1025:~$ ping cloudcephosd1019 -M do -s 8952 PING cloudcephosd1019(cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16)) 8952 data bytes 8960 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=1 ttl=62 time=0.224 ms 8960 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=2 ttl=62 time=0.186 ms ^C --- cloudcephosd1019 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1014ms rtt min/avg/max/mdev = 0.186/0.205/0.224/0.019 ms cmooney@cloudcephosd1025:~$ ping cloudcephosd1019 -M do -s 8953 PING cloudcephosd1019(cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16)) 8953 data bytes ping: local error: message too long, mtu: 9000 ping: local error: message too long, mtu: 9000 ^C --- cloudcephosd1019 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1032ms cmooney@cloudcephosd1025:~$ ping cloudcephosd1019 -s 8953 PING cloudcephosd1019(cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16)) 8953 data bytes 8961 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=1 ttl=62 time=0.265 ms 8961 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=2 ttl=62 time=0.234 ms 8961 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=3 ttl=62 time=0.236 ms
cmooney@cloudcephosd1025:~$ ping -c 2 cloudcephosd1016 -M do -s 8952 PING cloudcephosd1016(cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13)) 8952 data bytes 8960 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=1 ttl=62 time=0.241 ms 8960 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=2 ttl=62 time=0.262 ms --- cloudcephosd1016 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1005ms rtt min/avg/max/mdev = 0.241/0.251/0.262/0.010 ms cmooney@cloudcephosd1025:~$ cmooney@cloudcephosd1025:~$ ping -c 2 cloudcephosd1016 -M do -s 8953 PING cloudcephosd1016(cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13)) 8953 data bytes ping: local error: message too long, mtu: 9000 ping: local error: message too long, mtu: 9000 --- cloudcephosd1016 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1021ms cmooney@cloudcephosd1025:~$ cmooney@cloudcephosd1025:~$ ping -c 2 cloudcephosd1016 -s 8953 PING cloudcephosd1016(cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13)) 8953 data bytes 8961 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=1 ttl=62 time=0.249 ms 8961 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=2 ttl=62 time=0.218 ms --- cloudcephosd1016 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1008ms rtt min/avg/max/mdev = 0.218/0.233/0.249/0.015 ms
I don't understand this.... wasn't working then started???
cmooney@cloudcephosd1025:~$ ping cloudcephosd1012 -s 8952 -M do PING cloudcephosd1012(cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63)) 8952 data bytes 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=11 ttl=62 time=0.148 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=12 ttl=62 time=0.172 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=13 ttl=62 time=0.181 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=14 ttl=62 time=0.156 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=15 ttl=62 time=0.152 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=16 ttl=62 time=0.114 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=17 ttl=62 time=0.161 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=18 ttl=62 time=0.136 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=19 ttl=62 time=0.169 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=20 ttl=62 time=0.165 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=21 ttl=62 time=0.164 ms 8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=22 ttl=62 time=0.172 ms ^C --- cloudcephosd1012 ping statistics --- 22 packets transmitted, 12 received, 45.4545% packet loss, time 21502ms rtt min/avg/max/mdev = 0.114/0.157/0.181/0.017 ms
cmooney@cloudcephosd1025:~$ ping -4 cloudcephosd1016 -s 8952 -M do
PING (10.64.20.13) 8952(8980) bytes of data.
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=7 ttl=62 time=3777 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=19 ttl=62 time=3416 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=35 ttl=62 time=3353 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=50 ttl=62 time=3649 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=51 ttl=62 time=3561 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=79 ttl=62 time=3952 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=88 ttl=62 time=3783 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=91 ttl=62 time=3524 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=92 ttl=62 time=3381 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=93 ttl=62 time=3341 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=99 ttl=62 time=3749 ms
^C
- ping statistics ---
103 packets transmitted, 11 received, 89.3204% packet loss, time 103054ms
rtt min/avg/max/mdev = 3341.173/3589.601/3951.784/197.544 ms, pipe 4
cmooney@cloudcephosd1016:~$ ping -4 cloudcephosd1025 -s 8952 -M do PING cloudcephosd1025.eqiad.wmnet (10.64.148.2) 8952(8980) bytes of data. ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500
cmooney@cloudcephosd1016:~$ ip -d link show ens3f0np0 2: ens3f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether bc:97:e1:e2:01:40 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 9600 addrgenmode eui64 numtxqueues 74 numrxqueues 74 gso_max_size 65536 gso_max_segs 65535 portname p0
cmooney@cloudcephosd1016:~$ ip route get 10.64.148.2 10.64.148.2 via 10.64.20.1 dev ens3f0np0 src 10.64.20.13 uid 31721 cache expires 325sec mtu 1500
cmooney@cloudcephosd1016:~$ ping -4 cloudcephosd1025 -s 8952 -M do PING cloudcephosd1025.eqiad.wmnet (10.64.148.2) 8952(8980) bytes of data. ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 ^C
cmooney@cloudcephosd1016:~$ ping -4 cloudcephosd1025 -s 8952 -M do PING cloudcephosd1025.eqiad.wmnet (10.64.148.2) 8952(8980) bytes of data. From irb-1108.cloudsw1-e4-eqiad.eqiad.wmnet (10.64.147.1) icmp_seq=10 Frag needed and DF set (mtu = 1500) ping: local error: Message too long, mtu=1500 From irb-1108.cloudsw1-e4-eqiad.eqiad.wmnet (10.64.147.1) icmp_seq=11 Frag needed and DF set (mtu = 1500) ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500
cmooney@cloudsw1-e4-eqiad> show interfaces irb.1108 Logical interface irb.1108 (Index 549) (SNMP ifIndex 524) Description: xlink to cloudsw1-c8 Flags: Up SNMP-Traps 0x4004000 Encapsulation: ENET2 Bandwidth: 1Gbps Routing Instance: default-switch Bridging Domain: cloud-xlink2-eqiad Input packets : 725766431590 Output packets: 542381415729 Protocol inet, MTU: 9178 Max nh cache: 75000, New hold nh limit: 75000, Curr nh cnt: 1, Curr new hold cnt: 0, NH drop cnt: 0 Flags: Sendbcast-pkt-to-re, Is-Primary Addresses, Flags: Is-Preferred Is-Primary Destination: 10.64.147.0/31, Local: 10.64.147.1 Protocol inet6, MTU: 9178 Max nh cache: 75000, New hold nh limit: 75000, Curr nh cnt: 1, Curr new hold cnt: 0, NH drop cnt: 0 Flags: Is-Primary Addresses, Flags: Is-Preferred Is-Primary Destination: 2620:0:861:fe0c::/64, Local: 2620:0:861:fe0c::2 Addresses, Flags: Is-Preferred Destination: fe80::/64, Local: fe80::a6e1:1a04:5481:3080
root@cloudcephosd1016:~# ping -4 cloudcephosd1010 -s 8952 -M do
PING cloudcephosd1010.eqiad.wmnet (10.64.20.61) 8952(8980) bytes of data.
8960 bytes from cloudcephosd1010.eqiad.wmnet (10.64.20.61): icmp_seq=1 ttl=64 time=0.169 ms
8960 bytes from cloudcephosd1010.eqiad.wmnet (10.64.20.61): icmp_seq=2 ttl=64 time=0.201 ms
Feb 13 16:26:41 cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for protocol/exception TTL:aggregate exceeded its allowed bandwidth at fpc 0 for 2 times, started at 2023-02-13 16:26:40 UTC Feb 13 16:26:41 cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for protocol/exception L3MTU-fail:aggregate exceeded its allowed bandwidth at fpc 0 for 2 times, started at 2023-02-13 16:26:40 UTC Feb 13 17:24:09 cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_CLEAR: INFO: Host-bound traffic for protocol/exception TTL:aggregate has returned to normal. Its allowed bandwith was exceeded at fpc 0 for 2 times, from 2023-02-13 16:26:40 UTC to 2023-02-13 17:19:08 UTC Feb 13 17:24:09 cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_CLEAR: INFO: Host-bound traffic for protocol/exception L3MTU-fail:aggregate has returned to normal. Its allowed bandwith was exceeded at fpc 0 for 2
cmooney@cloudsw1-e4-eqiad> show interfaces descriptions Interface Admin Link Description xe-0/0/16 down down DISABLED xe-0/0/17 down down DISABLED xe-0/0/18 up down cloudcephosd1001 xe-0/0/19 up down cloudcephosd1001