Layer 2 load balancing (such as Pybal, WMF's production load balancer) requires nodes to be in its broadcast domain(s).
We had an incident Wednesday-Friday where a couple of hosts were added to PyBal rotation without the necessary VLAN plumbing in place , so they were pooled, but unable to send traffic to users. This had significant user impact (1% of all searches resulted in errors) , so I'm requesting that we monitor and alert for this situation.
The linked phab comment demonstrates a way to detect this situation:
cmooney@lvs1019:~$ ip route get fibmatch 10.64.152.2 default via 10.64.32.1 dev eno1np0 onlink
↑ bad, LVS host is routing traffic, which will never work.
cmooney@lvs1019:~$ ip route get fibmatch 10.64.152.2 10.64.152.0/24 dev vlan1047 proto kernel scope link src 10.64.152.19
↑ good, pooled node is directly connected