As part of our effort to reduce Cloud NAT exceptions (see parent task and T272395), we discovered that the nova-fullstack mechanism access individual VMs using SSH.
For now we added an ACL exception to allow this, but that's not a long term solution (see T272486).
Some random ideas for a longer term solution:
- allocate a floating IP and have nova-fullstack use it for SSH. This floating IP is reused for every test.
- Perhaps we need a couple of floating IPs for flapping resilence...
- This approach is expensive from the public IPv4 cost point of view
- Move cloudcontrol servers into a new network inside the cloud realm
- as described in https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Production_Cloud_services_relationship#Using_isolation_mechanisms
- this is probably the best way forward for several reasons.
- Rewrite nova-fullstack to don't do SSH tests
- or do them using a mechanism other than direct SSH connection (perhaps console access?)
- Use a CloudVPS bastion as jump host.
- simple, direct and to the point.