There is an issue in the labtestn setup regarding the neutron networking deployment using vxlan and l2population.
In this deployment, vxlan-2 is the interface that should connect all the virts and the network nodes.
If you inspect the packets in this interface in a virt node, you see something like this:
aborrero@labtestvirt2003:~ $ sudo tcpdump -i vxlan-2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vxlan-2, link-type EN10MB (Ethernet), capture size 262144 bytes 16:44:00.177241 ARP, Request who-has 172.16.130.1 tell 172.16.130.13, length 28 16:44:00.231328 ARP, Request who-has 172.16.130.1 tell 172.16.130.15, length 28 16:44:01.201191 ARP, Request who-has 172.16.130.1 tell 172.16.130.13, length 28 16:44:01.255605 ARP, Request who-has 172.16.130.1 tell 172.16.130.15, length 28 16:44:01.844882 ARP, Request who-has 172.16.130.1 tell 172.16.130.14, length 28 [...]
(i.e, no ARP replies)
However, if you inspect the packets in the networking node, you see something like this:
aborrero@labtestneutron2001:~ 8s $ sudo tcpdump -i vxlan-2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vxlan-2, link-type EN10MB (Ethernet), capture size 262144 bytes 16:28:31.402268 ARP, Request who-has 172.16.130.1 tell 172.16.130.13, length 28 16:28:31.402311 ARP, Reply 172.16.130.1 is-at fa:16:3e:3f:d3:9a (oui Unknown), length 28 16:28:31.916441 ARP, Request who-has 172.16.130.1 tell 172.16.130.14, length 28 16:28:31.916575 ARP, Reply 172.16.130.1 is-at fa:16:3e:3f:d3:9a (oui Unknown), length 28 16:28:32.426641 ARP, Request who-has 172.16.130.1 tell 172.16.130.13, length 28 16:28:32.426708 ARP, Reply 172.16.130.1 is-at fa:16:3e:3f:d3:9a (oui Unknown), length 28 16:28:32.940036 ARP, Request who-has 172.16.130.1 tell 172.16.130.14, length 28 16:28:32.940067 ARP, Reply 172.16.130.1 is-at fa:16:3e:3f:d3:9a (oui Unknown), length 28 16:28:33.261335 ARP, Request who-has 172.16.130.1 tell 172.16.130.15, length 28 16:28:33.261376 ARP, Reply 172.16.130.1 is-at fa:16:3e:3f:d3:9a (oui Unknown), length 28 16:28:33.449983 ARP, Request who-has 172.16.130.1 tell 172.16.130.13, length 28 16:28:33.450057 ARP, Reply 172.16.130.1 is-at fa:16:3e:3f:d3:9a (oui Unknown), length 28 [...]
(i.e, ARP replies are being sent)
This may mean there is a misconfiguration/bug somewhere that prevents proper configuration using vxlan as network overlay.
Also, @chasemp mentioned we may be affected by an upstream openstack bug (https://bugs.launchpad.net/neutron/+bug/1365476) which is related to the HA setup.
We don't fully know what's going on, but right now, we can't contact instances in the subnet which uses vxlan.
Right now relevant servers are (https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Deployments#Labtestn_deployment):
labtestcontrol2003.wikimedia.org labtestneutron2001.codfw.wmnet labtestneutron2002.codfw.wmnet labtestservices2002.wikimedia.org labtestservices2003.wikimedia.org labtestvirt2003.codfw.wmnet labtestmetal2001.codfw.wmnet (as virt)