Page MenuHomePhabricator
Paste P44441

cloudcephosd1019 checks
ActivePublic

Authored by cmooney on Feb 13 2023, 4:42 PM.
Tags
None
Referenced Files
F36823570: cloudcephosd1019 checks
Feb 13 2023, 4:42 PM
Subscribers
cmooney@cloudsw1-d5-eqiad> ping 10.64.20.16 source 10.64.20.3
PING 10.64.20.16 (10.64.20.16): 56 data bytes
64 bytes from 10.64.20.16: icmp_seq=0 ttl=64 time=0.704 ms
64 bytes from 10.64.20.16: icmp_seq=1 ttl=64 time=11.274 ms

Event Timeline

DOWN on storage network:

cmooney@cloudsw1-d5-eqiad> ping routing-instance cloud 192.168.4.19
PING 192.168.4.19 (192.168.4.19): 56 data bytes
^C

  • 192.168.4.19 ping statistics ---

Links physically up:

cmooney@cloudsw1-d5-eqiad> show interfaces descriptions | match 1019
xe-0/0/2 up up cloudcephosd1019 {#5350}
xe-0/0/3 up up cloudcephosd1019 {#0240}

False alert, works if correct source is specified (rahter than using junos loopback which is default)

cmooney@cloudsw1-d5-eqiad> ping routing-instance cloud 192.168.4.19 source 192.168.4.253
PING 192.168.4.19 (192.168.4.19): 56 data bytes
64 bytes from 192.168.4.19: icmp_seq=0 ttl=64 time=0.824 ms
64 bytes from 192.168.4.19: icmp_seq=1 ttl=64 time=11.326 ms
64 bytes from 192.168.4.19: icmp_seq=2 ttl=64 time=4.344 ms

Ping from cloudsw1-e4 to cloudcephosd1019 seems to be ok:

cmooney@cloudsw1-e4-eqiad> ping 10.64.20.16 source 10.64.148.1
PING 10.64.20.16 (10.64.20.16): 56 data bytes
64 bytes from 10.64.20.16: icmp_seq=0 ttl=63 time=0.610 ms
64 bytes from 10.64.20.16: icmp_seq=1 ttl=63 time=0.566 ms
64 bytes from 10.64.20.16: icmp_seq=2 ttl=63 time=0.476 ms

Ping from cloudsw1-e4 to cloudcephosd1025 seems ok too:

cmooney@cloudsw1-e4-eqiad> ping 10.64.148.2 source 10.64.148.1
PING 10.64.148.2 (10.64.148.2): 56 data bytes
64 bytes from 10.64.148.2: icmp_seq=0 ttl=64 time=0.651 ms
64 bytes from 10.64.148.2: icmp_seq=1 ttl=64 time=0.495 ms
64 bytes from 10.64.148.2: icmp_seq=2 ttl=64 time=0.372 ms

And with Jumbos:

cmooney@cloudsw1-e4-eqiad> ping 10.64.20.16 source 10.64.148.1 size 9000 do-not-fragment
PING 10.64.20.16 (10.64.20.16): 9000 data bytes
9008 bytes from 10.64.20.16: icmp_seq=0 ttl=63 time=1.857 ms
9008 bytes from 10.64.20.16: icmp_seq=1 ttl=63 time=2.742 ms
9008 bytes from 10.64.20.16: icmp_seq=2 ttl=63 time=1.808 ms

cmooney@cloudsw1-e4-eqiad> show interfaces descriptions | match cloudceph
xe-0/0/18 up down cloudcephosd1001
xe-0/0/19 up down cloudcephosd1001
xe-0/0/20 up up cloudcephosd1025 {#20220102}
xe-0/0/21 up up cloudcephosd1025 {#20220105}
xe-0/0/22 up up cloudcephosd1026 {#20220103}
xe-0/0/23 up up cloudcephosd1026 {#20220107}
xe-0/0/24 up up cloudcephosd1027 {#20220100}
xe-0/0/25 up up cloudcephosd1027 {#20220106}
xe-0/0/26 up up cloudcephosd1028 {#20220101}
xe-0/0/27 up up cloudcephosd1028 {#20220110}
xe-0/0/28 up up cloudcephosd1029 {#20220104}
xe-0/0/29 up up cloudcephosd1029 {#20220108}

Homer change on E4 for reference:

cmooney@cumin1001:~$ homer cloudsw1-e4-eqiad* commit "Enable ports connected to moved server cloudcephosd1001"
INFO:homer.devices:Initialized 53 devices
INFO:homer:Committing config for query cloudsw1-e4-eqiad* with message: Enable ports connected to moved server cloudcephosd1001
INFO:homer:Gathering global Netbox data
INFO:homer.devices:Matched 1 device(s) for query 'cloudsw1-e4-eqiad*'
INFO:homer:Generating configuration for cloudsw1-e4-eqiad.mgmt.eqiad.wmnet
WARNING:homer.capirca:Netbox capirca.GetHosts script is > 3 days old.
Configuration diff for cloudsw1-e4-eqiad.mgmt.eqiad.wmnet:

[edit interfaces]
+   xe-0/0/16 {
+       description DISABLED;
+       disable;
+   }
+   xe-0/0/17 {
+       description DISABLED;
+       disable;
+   }
+   xe-0/0/18 {
+       description cloudcephosd1001;
+       unit 0 {
+           family ethernet-switching {
+               interface-mode access;
+               vlan {
+                   members cloud-hosts1-e4-eqiad;
+               }
+           }
+       }
+   }
+   xe-0/0/19 {
+       description cloudcephosd1001;
+       unit 0 {
+           family ethernet-switching {
+               interface-mode access;
+               vlan {
+                   members cloud-storage1-e4-eqiad;
+               }
+           }
+       }
+   }

Type "yes" to commit, "no" to abort.
> yes
INFO:homer.transports.junos:Committing the configuration on cloudsw1-e4-eqiad.mgmt.eqiad.wmnet
INFO:homer:Homer run completed successfully on 1 devices: ['cloudsw1-e4-eqiad.mgmt.eqiad.wmnet']

Jumbo pings working across racks ok:

cmooney@cloudcephosd1025:~$ ping cloudcephosd1019 -M do -s 8952
PING cloudcephosd1019(cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16)) 8952 data bytes
8960 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=1 ttl=62 time=0.224 ms
8960 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=2 ttl=62 time=0.186 ms
^C
--- cloudcephosd1019 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1014ms
rtt min/avg/max/mdev = 0.186/0.205/0.224/0.019 ms
cmooney@cloudcephosd1025:~$ ping cloudcephosd1019 -M do -s 8953
PING cloudcephosd1019(cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16)) 8953 data bytes
ping: local error: message too long, mtu: 9000
ping: local error: message too long, mtu: 9000
^C
--- cloudcephosd1019 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1032ms

cmooney@cloudcephosd1025:~$ ping cloudcephosd1019  -s 8953
PING cloudcephosd1019(cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16)) 8953 data bytes
8961 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=1 ttl=62 time=0.265 ms
8961 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=2 ttl=62 time=0.234 ms
8961 bytes from cloudcephosd1019.eqiad.wmnet (2620:0:861:118:10:64:20:16): icmp_seq=3 ttl=62 time=0.236 ms
cmooney@cloudcephosd1025:~$ ping -c 2 cloudcephosd1016 -M do -s 8952
PING cloudcephosd1016(cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13)) 8952 data bytes
8960 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=1 ttl=62 time=0.241 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=2 ttl=62 time=0.262 ms

--- cloudcephosd1016 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 0.241/0.251/0.262/0.010 ms
cmooney@cloudcephosd1025:~$ 
cmooney@cloudcephosd1025:~$ ping -c 2 cloudcephosd1016 -M do -s 8953
PING cloudcephosd1016(cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13)) 8953 data bytes
ping: local error: message too long, mtu: 9000
ping: local error: message too long, mtu: 9000

--- cloudcephosd1016 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1021ms

cmooney@cloudcephosd1025:~$ 
cmooney@cloudcephosd1025:~$ ping -c 2 cloudcephosd1016 -s 8953
PING cloudcephosd1016(cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13)) 8953 data bytes
8961 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=1 ttl=62 time=0.249 ms
8961 bytes from cloudcephosd1016.eqiad.wmnet (2620:0:861:118:10:64:20:13): icmp_seq=2 ttl=62 time=0.218 ms

--- cloudcephosd1016 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.218/0.233/0.249/0.015 ms

I don't understand this.... wasn't working then started???

cmooney@cloudcephosd1025:~$ ping cloudcephosd1012 -s 8952 -M do
PING cloudcephosd1012(cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63)) 8952 data bytes
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=11 ttl=62 time=0.148 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=12 ttl=62 time=0.172 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=13 ttl=62 time=0.181 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=14 ttl=62 time=0.156 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=15 ttl=62 time=0.152 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=16 ttl=62 time=0.114 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=17 ttl=62 time=0.161 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=18 ttl=62 time=0.136 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=19 ttl=62 time=0.169 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=20 ttl=62 time=0.165 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=21 ttl=62 time=0.164 ms
8960 bytes from cloudcephosd1012.eqiad.wmnet (2620:0:861:118:10:64:20:63): icmp_seq=22 ttl=62 time=0.172 ms
^C
--- cloudcephosd1012 ping statistics ---
22 packets transmitted, 12 received, 45.4545% packet loss, time 21502ms
rtt min/avg/max/mdev = 0.114/0.157/0.181/0.017 ms

cmooney@cloudcephosd1025:~$ ping -4 cloudcephosd1016 -s 8952 -M do
PING (10.64.20.13) 8952(8980) bytes of data.
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=7 ttl=62 time=3777 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=19 ttl=62 time=3416 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=35 ttl=62 time=3353 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=50 ttl=62 time=3649 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=51 ttl=62 time=3561 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=79 ttl=62 time=3952 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=88 ttl=62 time=3783 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=91 ttl=62 time=3524 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=92 ttl=62 time=3381 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=93 ttl=62 time=3341 ms
8960 bytes from cloudcephosd1016.eqiad.wmnet (10.64.20.13): icmp_seq=99 ttl=62 time=3749 ms
^C

  • ping statistics ---

103 packets transmitted, 11 received, 89.3204% packet loss, time 103054ms
rtt min/avg/max/mdev = 3341.173/3589.601/3951.784/197.544 ms, pipe 4

cmooney@cloudcephosd1016:~$ ping -4 cloudcephosd1025 -s 8952 -M do
PING cloudcephosd1025.eqiad.wmnet (10.64.148.2) 8952(8980) bytes of data.
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
cmooney@cloudcephosd1016:~$ ip -d link show ens3f0np0 
2: ens3f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether bc:97:e1:e2:01:40 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 9600 addrgenmode eui64 numtxqueues 74 numrxqueues 74 gso_max_size 65536 gso_max_segs 65535 portname p0
cmooney@cloudcephosd1016:~$ ip route get 10.64.148.2
10.64.148.2 via 10.64.20.1 dev ens3f0np0 src 10.64.20.13 uid 31721 
    cache expires 325sec mtu 1500
cmooney@cloudcephosd1016:~$ ping -4 cloudcephosd1025 -s 8952 -M do
PING cloudcephosd1025.eqiad.wmnet (10.64.148.2) 8952(8980) bytes of data.
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
^C
cmooney@cloudcephosd1016:~$ ping -4 cloudcephosd1025 -s 8952 -M do
PING cloudcephosd1025.eqiad.wmnet (10.64.148.2) 8952(8980) bytes of data.
From irb-1108.cloudsw1-e4-eqiad.eqiad.wmnet (10.64.147.1) icmp_seq=10 Frag needed and DF set (mtu = 1500)
ping: local error: Message too long, mtu=1500
From irb-1108.cloudsw1-e4-eqiad.eqiad.wmnet (10.64.147.1) icmp_seq=11 Frag needed and DF set (mtu = 1500)
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
cmooney@cloudsw1-e4-eqiad> show interfaces irb.1108 
  Logical interface irb.1108 (Index 549) (SNMP ifIndex 524)
    Description: xlink to cloudsw1-c8
    Flags: Up SNMP-Traps 0x4004000 Encapsulation: ENET2
    Bandwidth: 1Gbps
    Routing Instance: default-switch Bridging Domain: cloud-xlink2-eqiad
    Input packets : 725766431590
    Output packets: 542381415729
    Protocol inet, MTU: 9178
    Max nh cache: 75000, New hold nh limit: 75000, Curr nh cnt: 1, Curr new hold cnt: 0, NH drop cnt: 0
      Flags: Sendbcast-pkt-to-re, Is-Primary
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 10.64.147.0/31, Local: 10.64.147.1
    Protocol inet6, MTU: 9178
    Max nh cache: 75000, New hold nh limit: 75000, Curr nh cnt: 1, Curr new hold cnt: 0, NH drop cnt: 0
      Flags: Is-Primary
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 2620:0:861:fe0c::/64, Local: 2620:0:861:fe0c::2
      Addresses, Flags: Is-Preferred
        Destination: fe80::/64, Local: fe80::a6e1:1a04:5481:3080

root@cloudcephosd1016:~# ping -4 cloudcephosd1010 -s 8952 -M do
PING cloudcephosd1010.eqiad.wmnet (10.64.20.61) 8952(8980) bytes of data.
8960 bytes from cloudcephosd1010.eqiad.wmnet (10.64.20.61): icmp_seq=1 ttl=64 time=0.169 ms
8960 bytes from cloudcephosd1010.eqiad.wmnet (10.64.20.61): icmp_seq=2 ttl=64 time=0.201 ms

Feb 13 16:26:41  cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for protocol/exception  TTL:aggregate exceeded its allowed bandwidth at fpc 0 for 2 times, started at 2023-02-13 16:26:40 UTC
Feb 13 16:26:41  cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for protocol/exception  L3MTU-fail:aggregate exceeded its allowed bandwidth at fpc 0 for 2 times, started at 2023-02-13 16:26:40 UTC

Feb 13 17:24:09  cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_CLEAR: INFO: Host-bound traffic for protocol/exception TTL:aggregate has returned to normal. Its allowed bandwith was exceeded at fpc 0 for 2 times, from 2023-02-13 16:26:40 UTC to 2023-02-13 17:19:08 UTC
Feb 13 17:24:09  cloudsw1-e4-eqiad jddosd[9121]: DDOS_PROTOCOL_VIOLATION_CLEAR: INFO: Host-bound traffic for protocol/exception L3MTU-fail:aggregate has returned to normal. Its allowed bandwith was exceeded at fpc 0 for 2
cmooney@cloudsw1-e4-eqiad> show interfaces descriptions 
Interface       Admin Link Description
xe-0/0/16       down  down DISABLED
xe-0/0/17       down  down DISABLED
xe-0/0/18       up    down cloudcephosd1001
xe-0/0/19       up    down cloudcephosd1001