Since earlier today several hosts stopped being able to receive arp packets from the other.
Two of those hosts for example are in the same vlan, same fabric, but different members of that same fabric.
There are no firewall rules blocking that traffic, or other security features blocking that traffic.
No changes have been made to the switch fabric or the hosts before the issue started.
The two hosts I'm testing it with are elastic1049 (10.64.16.111) and elastic1038 (10.64.16.47), which are in the same vlan (/22)
No pings from one way to the other:
$ elastic1049:~$ ping 10.64.16.47 PING 10.64.16.47 (10.64.16.47) 56(84) bytes of data. ^C --- 10.64.16.47 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2039ms
elastic1038:~$ ping 10.64.16.111 PING 10.64.16.111 (10.64.16.111) 56(84) bytes of data. ^C --- 10.64.16.111 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1002ms
Only one way ARPing:
ayounsi@elastic1049:~$ sudo arping 10.64.16.47 ARPING 10.64.16.47 60 bytes from 14:02:ec:06:9e:dc (10.64.16.47): index=0 time=3.124 msec 60 bytes from 14:02:ec:06:9e:dc (10.64.16.47): index=1 time=15.394 msec 60 bytes from 14:02:ec:06:9e:dc (10.64.16.47): index=2 time=15.131 msec 60 bytes from 14:02:ec:06:9e:dc (10.64.16.47): index=3 time=6.643 msec ^C --- 10.64.16.47 statistics --- 4 packets transmitted, 4 packets received, 0% unanswered (0 extra) rtt min/avg/max/std-dev = 3.124/10.073/15.394/5.337 ms
elastic1038:~$ sudo arping 10.64.16.111 ARPING 10.64.16.111 Timeout Timeout Timeout Timeout ^C --- 10.64.16.111 statistics --- 5 packets transmitted, 0 packets received, 100% unanswered (0 extra)
While elastic1049 (10.64.16.111) sees the ARP requests from elastic1038 (10.64.16.47) and replies to them, but elastic1038 (10.64.16.47) never sees the replies.
elastic1049:~$ sudo tcpdump arp host 10.64.16.47 -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:18:05.542508 ARP, Request who-has 10.64.16.111 tell 10.64.16.47, length 46 15:18:05.542524 ARP, Reply 10.64.16.111 is-at 94:18:82:6f:18:18, length 28 15:18:06.542697 ARP, Request who-has 10.64.16.111 tell 10.64.16.47, length 46 15:18:06.542715 ARP, Reply 10.64.16.111 is-at 94:18:82:6f:18:18, length 28 15:18:07.542722 ARP, Request who-has 10.64.16.111 tell 10.64.16.47, length 46 15:18:07.542737 ARP, Reply 10.64.16.111 is-at 94:18:82:6f:18:18, length 28
elastic1038:~$ sudo tcpdump arp host 10.64.16.111 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:23:09.342561 ARP, Request who-has elastic1049.eqiad.wmnet tell elastic1038.eqiad.wmnet, length 28 15:23:09.588770 ARP, Request who-has elastic1049.eqiad.wmnet tell elastic1038.eqiad.wmnet, length 28 15:23:10.342820 ARP, Request who-has elastic1049.eqiad.wmnet tell elastic1038.eqiad.wmnet, length 28 15:23:10.588922 ARP, Request who-has elastic1049.eqiad.wmnet tell elastic1038.eqiad.wmnet, length 28
This looks like a VCF issue to me.
High priority case 2018-0802-0511 opened with Juniper.