I don't know of a reason why 220.127.116.11 is different from other floating IPs in WMCS, but much of the internet can't reach it. Another floating IP in the same tenant (tools) works fine: 18.104.22.168
We are looking at https://tools.keycdn.com/ping which shows that IP as timing out from SF, Singapore, Bangalore, Sydney, Tokyo. It works from Dallas and parts East.
I used https://tools.keycdn.com/traceroute to get a bunch of traceroutes while chatting in -sre IRC:
I've dumped them in
The 2nd to last hop on the working ones is always 22.214.171.124 (irb-1103.cloudsw1-d5-eqiad.wikimedia.org) but 126.96.36.199 (irb-1102.cloudsw1-c8-eqiad.wikimedia.org) fails.
Unfortunately, my networking skills end there.
Floating IPs in eqiad1 are from network 5c9ee953-3a19-4e84-be0f-069b5da75123 which is associated with two subnets:
efbb8c8a-1397-4faf-a07f-e9bcc33899b5: 188.8.131.52/25 aka 184.108.40.206-220.127.116.11
7c6bcc12-212f-44c2-9954-5c55002ee371: 18.104.22.168/29 aka 22.214.171.124-126.96.36.199
I'm going to venture a guess that that second subnet is largely unknown and ignored when filtering rules are made, so that policy is inconsistent for that last /29.
I'm not sure if the right solution is to fix the /29 or just stop using it.
I've changed ttls for that domain to 60, will switch to a lower IP after the ttls refresh. Then we can figure out what to do about .245
I've moved toolserver to org to an IP on the lower subnet: 188.8.131.52.
Nevertheless, there remains the question of what to do about 184.108.40.206/29. I see a few IPs allocated from that range, so if anyone is using them they're probably running into trouble.
Arturo, I'm handing this off to you -- we need to either document this better or stop using the upper IP range.
An example of 'documentation' is in modules/network/data/data.yaml:
eqiad: private: cloud-instances2-b-eqiad: ipv4: 172.16.0.0/21 public: cloud-eqiad1-floating: ipv4: 220.127.116.11/25
so 18.104.22.168/29 is not a CIDR meant to allocate floatings IP from. Is just a interlink subnet. Perhaps the problem here is neutron allowing floating IP allocation from the wrong subnet, something I've seen before...:
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:02:03Z] <arturo> prevent floating IP allocation from neutron transport subnet: root@cloudcontrol1005:~# neutron subnet-update --allocation-pool start=22.214.171.124,end=126.96.36.199 cloud-instances-transport1-b-eqiad1 (T271867)
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:02:40Z] <arturo> delete floating IP allocation 188.8.131.52 (T271867)
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:05:24Z] <arturo> release and delete floating IP 184.108.40.206 (docker-registry.toolsbeta.wmflabs.org) (T271867)
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:07:13Z] <arturo> allocate floating IP 220.127.116.11, and use it for docker-registry.toolsbeta.wmflabs.org (instance toolsbeta-docker-registry-01) (T271867)