I'm failing to connect to bastion-codfw1dev-01.codfw1dev.wmcloud.org - it resolves successfully but is not responding to SSH?
The security group rules look fine. Is it ferm or the codfw1dev networking in general?
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Andrew | T242766 upgrade cloud-vps openstack to Openstack version 'Queens' | |||
Resolved | aborrero | T247135 codfw1dev unavailable? |
Event Timeline
Previously this was a nova-compute issue, solved with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/578378/.
Now it appears to be a networking issue; hosts can reach other VMs but not contact outside servers (including the name server):
root@cloudvirt2001-dev:~# virsh console 47e414aa-03ec-4c7c-b632-9b6cf5e37119 Connected to domain i-00000462 Escape character is ^] root@bastion-codfw1dev-01:~# cat /etc/resolv.conf ## THIS FILE IS MANAGED BY PUPPET ## ## source: modules/base/resolv.conf.labs.erb ## from: base::resolving domain bastioninfra-codfw1dev.codfw1dev.cloud search bastioninfra-codfw1dev.codfw1dev.cloud codfw1dev.cloud nameserver 208.80.153.78 nameserver 208.80.153.78 options timeout:2 ndots:1 root@bastion-codfw1dev-01:~# ping 208.80.153.78 PING 208.80.153.78 (208.80.153.78) 56(84) bytes of data. ^C --- 208.80.153.78 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 183ms root@bastion-codfw1dev-01:~# ping 172.16.128.19 PING 172.16.128.19 (172.16.128.19) 56(84) bytes of data. 64 bytes from 172.16.128.19: icmp_seq=1 ttl=64 time=0.561 ms 64 bytes from 172.16.128.19: icmp_seq=2 ttl=64 time=0.642 ms ^C --- 172.16.128.19 ping statistics ---
This may be related to some firewall changes made on Friday; in any case I'm going to drop this in @aborrero 's lap at least until I'm back to work on Tuesday.
(Note that I also tested with a fully external IP, 216.58.192.196 for google.com and it can't reach that either)
Mentioned in SAL (#wikimedia-cloud) [2020-03-10T13:55:04Z] <arturo> [codfw1dev] rebooting cloudnet2003-dev into linux kernel 4.14 for testing stuff related to T247135
Mentioned in SAL (#wikimedia-cloud) [2020-03-10T17:02:11Z] <arturo> [codfw1dev] deleting address scopes, bad interaction with our custom NAT setup T247135
I confirm that address scopes have a bad interaction with our setup. I was using address scopes as part of the BGP configuration.
I can see now neutron doing SNAT:
# inside netns root@cloudnet2003-dev:~ # conntrack -E -j -p icmp [NEW] icmp 1 30 src=172.16.128.14 dst=8.8.8.8 type=8 code=0 id=25513 [UNREPLIED] src=8.8.8.8 dst=185.15.57.1 type=0 code=0 id=25513 mark=67108864 # main netns aborrero@cloudnet2003-dev:~ $ sudo tcpdump -i br-external icmp 09:33:18.126335 IP 185.15.57.1 > dns.google: ICMP echo request, id 25519, seq 1, length 64
The packet never returns, which may indicate a filtering problem related to T246887: CloudVPS: introduce filtering for neutron BGP addresses.
Mentioned in SAL (#wikimedia-cloud) [2020-03-11T12:50:56Z] <arturo> [codfw1dev] several tests creating/deleting address scopes (T244727 T247135 T246887 T245606)
The BGP-related filter has been dropped. You should be able to contact now floating IPs from the internet and VM should have full connectivity.
As of right now:
arturo@endurance:~ $ ssh -i .ssh/wmf_cloud_root_arturo root@185.15.57.2 Enter passphrase for key '.ssh/wmf_cloud_root_arturo': Linux bastion-codfw1dev-02 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 Debian GNU/Linux 10 (buster) The last Puppet run was at Wed Mar 18 11:07:33 UTC 2020 (10 minutes ago). Last puppet commit: (641fe4e349) Jbond - debdeploy: add libGraphicsMagick-Q16 as a lib for graphicsmagick Last login: Mon Mar 9 20:03:25 2020 root@bastion-codfw1dev-02:~# apt-get update Hit:1 http://deb.debian.org/debian buster InRelease Hit:2 http://deb.debian.org/debian buster-updates InRelease Get:3 http://apt.wikimedia.org/wikimedia buster-wikimedia InRelease [34.4 kB] Hit:4 http://mirrors.wikimedia.org/debian buster-backports InRelease Hit:5 http://security.debian.org buster/updates InRelease Get:6 http://apt.wikimedia.org/wikimedia buster-wikimedia/main Sources [24.8 kB] Get:7 http://apt.wikimedia.org/wikimedia buster-wikimedia/main amd64 Packages [36.7 kB] Fetched 95.9 kB in 1s (161 kB/s) Reading package lists... Done
Thanks Arturo. Confirmed I can get in, the bastion has internet access, and I can SSH through to internal instances and connect out from those too.