I don't know of a reason why 185.15.56.245 is different from other floating IPs in WMCS, but much of the internet can't reach it. Another floating IP in the same tenant (tools) works fine: 185.15.56.60
Description
Related Objects
Event Timeline
We are looking at https://tools.keycdn.com/ping which shows that IP as timing out from SF, Singapore, Bangalore, Sydney, Tokyo. It works from Dallas and parts East.
I used https://tools.keycdn.com/traceroute to get a bunch of traceroutes while chatting in -sre IRC:
I've dumped them in
The 2nd to last hop on the working ones is always 208.80.154.213 (irb-1103.cloudsw1-d5-eqiad.wikimedia.org) but 208.80.154.211 (irb-1102.cloudsw1-c8-eqiad.wikimedia.org) fails.
Unfortunately, my networking skills end there.
Floating IPs in eqiad1 are from network 5c9ee953-3a19-4e84-be0f-069b5da75123 which is associated with two subnets:
efbb8c8a-1397-4faf-a07f-e9bcc33899b5: 185.15.56.0/25 aka 185.15.56.2-185.15.56.126
7c6bcc12-212f-44c2-9954-5c55002ee371: 185.15.56.240/29 aka 185.15.56.242-185.15.56.246
I'm going to venture a guess that that second subnet is largely unknown and ignored when filtering rules are made, so that policy is inconsistent for that last /29.
I'm not sure if the right solution is to fix the /29 or just stop using it.
I've changed ttls for that domain to 60, will switch to a lower IP after the ttls refresh. Then we can figure out what to do about .245
I've moved toolserver to org to an IP on the lower subnet: 185.15.56.62.
Nevertheless, there remains the question of what to do about 185.15.56.240/29. I see a few IPs allocated from that range, so if anyone is using them they're probably running into trouble.
Arturo, I'm handing this off to you -- we need to either document this better or stop using the upper IP range.
An example of 'documentation' is in modules/network/data/data.yaml:
eqiad: private: cloud-instances2-b-eqiad: ipv4: 172.16.0.0/21 public: cloud-eqiad1-floating: ipv4: 185.15.56.0/25
so 185.15.56.240/29 is not a CIDR meant to allocate floatings IP from. Is just a interlink subnet. Perhaps the problem here is neutron allowing floating IP allocation from the wrong subnet, something I've seen before...:
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:02:03Z] <arturo> prevent floating IP allocation from neutron transport subnet: root@cloudcontrol1005:~# neutron subnet-update --allocation-pool start=185.15.56.244,end=185.15.56.244 cloud-instances-transport1-b-eqiad1 (T271867)
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:02:40Z] <arturo> delete floating IP allocation 185.15.56.245 (T271867)
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:05:24Z] <arturo> release and delete floating IP 185.15.56.242 (docker-registry.toolsbeta.wmflabs.org) (T271867)
Mentioned in SAL (#wikimedia-cloud) [2021-01-13T10:07:13Z] <arturo> allocate floating IP 185.15.56.84, and use it for docker-registry.toolsbeta.wmflabs.org (instance toolsbeta-docker-registry-01) (T271867)